How to Create a Text Summarizer with the Hugging Face Library: A Beginner’s Guide

Josiah Adesola
4 min readFeb 25, 2023

--

Text summarization is a non-negligible part of writing, So, how can you create a program to summarize a body of text quickly?

Natural Language Processing (commonly known as NLP) is a part of Artificial Intelligence that can interpret and perform tasks with human language. The emergence of ChatGPT has led to the use of NLP in various fields such as chatbots, named entity recognition, question-and-answer systems, and many more. Among all the demands, text summarization has particularly been a major one in the NLP space.

What you’ll learn:

  • What is text summarization?
  • About the Hugging face platform
  • How to use Google Colab
  • How to build a text summarizer

What is Text Summarization?

Text summarization is reducing long articles, a body of texts or paragraphs into shorter and more meaningful statements and sentences. NLP commonly uses this technique to form a consolidated version of articles for proper understanding.

Applying text summarization results in easy-to-read and quick comprehension for customer reviews, emails, social media messages, and taking minutes of meetings among many others.

Hugging Face Platform

The Hugging Face platform provides an online repository for thousands of NLP models and datasets from individuals and top tech companies like Google, OpenAI, Facebook, and Microsoft. By, offering a lot of pre-trained models, saves users the stress of building an NLP project from scratch.

To train an NLP model properly, you need millions, if not billions of datasets from various sources, which can be costly in terms of time, energy, and resources. The Hugging Face platform offers an alternative by allowing you to use pre-existing datasets, as well as upload your own models and create private datasets. Additionally, you can host demo ML apps using Python SDKs such as Gradio and Streamlit.

What are Transformers?

Transformers, a neural network architecture in NLP, uses two major components called encoders and decoders. Encoders convert inputs, such as text for language translation and text summarization, into readable vectors for the machine. Decoders then interpret those vectors from the machine into final results that are easy for people to understand

In 2017, Google Researchers developed Transformers, with the article called “Attention is All You Need” which revolutionized the way NLP models were trained. Before transformers, NLP models used Recurrent Neural Networks (RNNs), but they had the problem of interpreting words in one part of a document differently than another word in a different part of the same document, similar to memory loss.

Transformers take all the words in a document as a whole, perform a semantic comparison for better understanding and convert them into vectors, which makes them readable for the computer.

Building the Text Summarizer in 5 Steps

  1. Launch your Google Colab: Google Colab is an ML notebook for coding ML programs in the Python programming language. It can be accessed online and it is so efficient for collaboration, and it comes with some pre-installed python libraries.
  2. Go to the Google Colab website.
  3. Sign in with your Gmail account.
  4. Click on the “New Notebook” text.
  1. Edit the title, do not edit the “.ipynb” text(this is the notebook extension).

After doing this successfully. You’re ready to start building your text summarizer program.

2. Install the transformer library: As explained in the previous part of this article, transformers are key components in the NLP. It encodes and decodes the text for your computer to understand.

  • The first line of code installs the transformers library from the hugging face platform.
  • The second line of code imports the pipeline function from the “transformers” library.

3. Build your model: The “summarizer” is a variable, with a call of the “pipeline” function. I know you are thinking that this is just a single line of code, how simple. Yes, this is because as you can see, the model used here is by Facebook, you can get the link and check on the hugging face platform once you run the code.

The model here is a pre-trained model, you do not need to do data cleaning, tokenization, and the rest.

All these processes and more have been done, you only need to call the pipeline-summarization function, input the model you wish and run the code. While running your code, you notice that the models get downloaded.

4. Summarize your article: The final step is to summarize your article into the desired length you want.

After running this code, you will have an output of a minimum of 25 words, and a maximum of 50 words. The variables such as “max_len” and “min_len” can be altered to change the maximum and minimum length of the word respectively.

Text summarization has a lot of applications in summarizing text documents, customer reviews and feedback, research papers, getting the necessary details from product reviews and a host of others.

Demo

Watch the live demo of the project.

--

--

Josiah Adesola

Writes about machine learning, Data Science, Python. Creative. Thinker. Engineering. Twitter: @_JosiahAdesola