A Journey into the Fabulous Applications of Transformers — Part 1 – Towards AI

A Journey into the Fabulous Applications of Transformers — Part 1 – Towards AI

Author(s): Dr. Dharini R

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

A Journey Into the Fabulous Applications of Transformers — Part 1

Demo with Emphasis on NLP using Python, Hugging Face.

Photo by Arseny Togulev on Unsplash

The introduction of transformers has made a huge impact on Artificial Intelligence, especially in the Natural Language Processing domain.

Transformers paved way for the most awaited success of transfer learning in Natural Language Processing.

As a result, many large language models came into existence, and now we are able to build beneficial applications on top of these cutting-edge models.

A transformer is, in simpler language, an encoder-decoder architecture with a self-attention mechanism on both sides. The encoder block takes input and converts it into numerical form, and the decoder block takes that numerical form and converts it to text. To constrain the article to the specific applications of transformers, we will not delve into the depth of its architecture. Kindly go through the links in the Reference section to understand the architecture, evolution, and fundamentals of transformers.

Why Transformers?

Transformers aided in successfully establishing “Transfer Learning” in NLP by enabling the usability of features extracted from a pretrained model.

The idea is to utilize a model that is pretrained with humongous text data (also called as Large Language Models), by fine-tuning it to our own purpose. Along with saving a lot of time and expense, the need for large training data has reduced considerably since we are using an already pretrained model.

This, in turn, triggered the formation of widespread research into transformers and brought the existence of numerous NLP pretrained models. The implementation of numerous applications with these models is made possible and bloomed the transfer learning in NLP.

Hugging Face

Hugging Face is a library built by artificial intelligence enthusiasts where myriad models are built and shared with the community. Hugging Face comprises pretrained models for domains such as Computer Vision, Natural Language Processing, Audio Processing, Tabular data, Reinforcement Learning, and Multimodal applications. All the models can easily be accessed using the API(Application Programming Interface)

The aim of the article is to harness the NLP domain and explore the possible applications with a demo code and explanation.

To utilize any of the Hugging Face models, the first step is to install transformers library as follows.

pip install transformers

The next step is to utilize pipeline which helps in hiding all the complex steps behind the model implementation and providing a simple, easy-to-use API for the models. For each of the major applications mentioned below, we will see the code along with an explanation. There are many models available in Hugging Face for every task in every domain. Since this article is about awareness of the applications, we will utilize the default model set by the library itself. Also, considering that this article is about the introduction to these possibilities, the details regarding the models can be referred to from the links given in each section.

Applications

In all of the applications discussed in the article, we are going to follow these steps.

1. Import pipeline

2. Invoke a model for the corresponding task by instantiatingpipeline. The model weights of the default model for the task are downloaded in this step.

3. Utilize that model by giving the required input.

4. Review the results and try different inputs.

The code and output for all the applications are given in this GitHub repository, along with a Google Colab link.

The applications discussed in this article are

1. Text Classification

2. Text Summarization

3. Question Answering

4. Text Generation

5. Named Entity Recognition

1. Text Classification

Classification is the process of placing the given text in any one of the informed categories. Let us start with importing the required library as follows.

The next step is to initialize a variable named text_classifier that represents the model being invoked in the text-classification category. As we know, the model is invoked bypipeline function using the task name. Since a specific model name is not given, the default model (distilbert-base-uncased-finetuned-sst-2-english) will be downloaded, which can be seen in the following output.

from transformers import pipeline
text_classifier = pipeline(“text-classification”)

Output:
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
629/629 [00:00<00:00, 24.4kB/s]
Downloading: 100%
268M/268M [00:05<00:00, 62.8MB/s]
Downloading: 100%
48.0/48.0 [00:00<00:00, 1.70kB/s]
Downloading: 100%
232k/232k [00:00<00:00, 6.48MB/s]

As the next step, we give input to the text classification model using the initialized variable text_classifier. The model being invoked helps us to classify the given text and produce a score on POSITIVE and NEGATIVE sentiments on the text. For more information about the model, please click this link.

In the code given below, a sentence representing a negative sentiment is given, and the results are stored in clf_result. On printing the output, we can see that the model classified the sentence into a NEGATIVE category and gives a score (representing sentiment score).

clf_result = text_classifier(“Oh God!!!! Its so horrible to hear about the news of aircraft”)
print(clf_result)

Output
[{‘label’: ‘NEGATIVE’, ‘score’: 0.9992474317550659}]

Now let us try with another sentence and see the result. In the snippet below, it can be seen that the result is POSITIVE label as we have given a statement that expresses relief.

clf_result = text_classifier(“Oh God!!!! So relieved to hear about the aircraft”)
print(clf_result)

Output
[{‘label’: ‘POSITIVE’, ‘score’: 0.9974811673164368}]

2. Text Summarization

Text summarization is the task of extracting a summary from a given set of sentences. The first step is to import pipeline and the next, we instantiate with the task of our choice (summarization). The default model for text summarization is sshleifer/distilbart-cnn-12–6 and to know more about the model, please check this link. The model is invoked under the variable text_summarizer.

from transformers import pipeline
text_summarizer = pipeline(“summarization”)

Output:
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
1.80k/1.80k [00:00<00:00, 47.4kB/s]
Downloading: 100%
1.22G/1.22G [00:50<00:00, 47.4MB/s]
Downloading: 100%
26.0/26.0 [00:00<00:00, 460B/s]
Downloading: 100%
899k/899k [00:00<00:00, 1.60MB/s]
Downloading: 100%
456k/456k [00:00<00:00, 1.73MB/s]

The following snippet shows the input_text variable loaded with the text to be summarized.

input_text = “””Education is a purposeful activity directed at achieving certain aims,
such as transmitting knowledge or fostering skills and character traits.
These aims may include the development of understanding, rationality, kindness, and honesty.
Various researchers emphasize the role of critical thinking in order to distinguish education
from indoctrination. Some theorists require that education results in an improvement of the student
while others prefer a value-neutral definition of the term. In a slightly different sense, education
may also refer, not to the process, but to the product of this process: the mental states and dispositions
possessed by educated people. Education originated as the transmission of cultural heritage from one generation
to the next. Today, educational goals increasingly encompass new ideas such as the liberation of learners, skills
needed for modern society, empathy, and complex vocational skills.”””

Using text_summarizer, we give input to the model, along with a parameter mentioning the maximum length of the summary. On printing the output, we can see the summary as given below.

summary = text_summarizer(input_text, max_length =100)
print(summary[0][‘summary_text’])

Output:
Some theorists require that education results in an improvement of the student.
Others prefer a value-neutral definition of the term. Education originated
as the transmission of cultural heritage from one generation to the next.
Today, educational goals increasingly encompass new ideas such as the
liberation of learners, skills needed for modern society, empathy, and
complex vocational skills.

3. Question Answering

Another intriguing application of transformers is the ability to build a question-answering (QA) system. The idea is to provide the model, a set of sentences that acts as a context for understanding. Based on that understanding, the model gives answers to questions. In simpler terms, the model comprehends the context and extracts relevant answers to questions based on that context.

To utilize a QA model let us first instantiate pipeline with the task question-answering. The default model for the task is distilbert-base-cased-distilled-squad, and thus the model weights are downloaded. The model is referred with the user-defined variable qna_model as we can see in the below snippet.

from transformers import pipeline
qna_model = pipeline(“question-answering”)

Output:
No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
473/473 [00:00<00:00, 2.63kB/s]
Downloading: 100%
261M/261M [00:06<00:00, 51.3MB/s]
Downloading: 100%
29.0/29.0 [00:00<00:00, 508B/s]
Downloading: 100%
213k/213k [00:00<00:00, 1.49MB/s]
Downloading: 100%
436k/436k [00:00<00:00, 1.09MB/s]

Below, we have given a set of sentences (input_text) about GitHub, which the model learns as its context.

input_text = “””GitHub Inc is an Internet hosting service for software development
and version control using Git. It provides the distributed version control of
Git plus access control, bug tracking, software feature requests, task management,
continuous integration, and wikis for every project. Headquartered in California,
it has been a subsidiary of Microsoft since 2018.
It is commonly used to host open source software development projects.
As of June 2022, GitHub reported having over 83 million developers and more
than 200 million repositories, including at least 28 million public
repositories. It is the largest source code host as of November 2021.
“””

We come up with three questions based on input_text as seen below. question_1 is to extract a definition of GitHub, question_2 is aimed at extracting a number and question_3 is given to see how the model replies to a yes/no question based on the context.

question_1 = “What is GitHub?”
question_2 = “How many repositories does GitHub have?”
question_3 = “Can I use GitHub to host my project?”

Now let’s provide the input_text as context and question_1 as a question to the model and print the answer. Voila, we have a model that can help us understand a passage and get quick answers !!

answer_1 = qna_model(question = question_1, context = input_text)
print(answer_1)

Output:
{‘score’: 0.17357327044010162,
‘start’: 14,
‘end’: 86,
‘answer’: ‘an Internet hosting service for software development and version control’}

In the same way, let us try the other two questions and see the results.

answer_2 = qna_model(question = question_2, context = input_text)
print(answer_2)

Output:
{‘score’: 0.3084281086921692,
‘start’: 506,
‘end’: 527,
‘answer’: ‘more than 200 million’}

answer_3 = qna_model(question = question_3, context = input_text)
answer_3

Output:
{‘score’: 0.471432626247406,
‘start’: 364,
‘end’: 433,
‘answer’: ‘It is commonly used to host open source software development projects’}

The first two questions gave us the intended answers. The answer for the third is actually a yes, and the model replied similarly by indicating the possibility of hosting projects in GitHub.

4. Text Generation

The next interesting application is to generate texts using transformers. We can utilize a text generation model by providing prompt, max_length and num_return_sequences. As the name suggests max_length indicates the length of text generated and num_return_seuqences indicates the number of texts generated. A prompt is the context/idea based on which the model generates new words.

In the code below, we instantiate pipelinewith text-generation task and the default model’s (gpt2,check this link for more details) weights being loaded in the variable text_generator.

from transformers import pipeline
text_generator = pipeline(“text-generation”)

Output:
No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
665/665 [00:00<00:00, 14.2kB/s]
Downloading: 100%
548M/548M [00:11<00:00, 40.0MB/s]
Downloading: 100%
1.04M/1.04M [00:00<00:00, 3.07MB/s]
Downloading: 100%
456k/456k [00:00<00:00, 867kB/s]
Downloading: 100%
1.36M/1.36M [00:00<00:00, 3.09MB/s]

We are going to see three examples with different prompts. The first two prompts are about ‘AI based text generation’. The first one is shown below.

prompt = “AI based text generation is”
text_generator(prompt, max_length=30, num_return_sequences=5)
Output:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[{‘generated_text’: ‘AI based text generation is a great idea as for you do not need to build and deploy your own font. The free font solution is the free Font’},
{‘generated_text’: ‘AI based text generation is the only system that’s able to take the power of the internet offline.”nnIt will work with all services underwritten’},
{‘generated_text’: ‘AI based text generation is available that can work with any text size on one device, including on computers running Windows or OSX. The Microsoft RDR’},
{‘generated_text’: ‘AI based text generation is used in this research to create a novel, non-text vector for the translation of human words, which will help reduce the’},
{‘generated_text’: ‘AI based text generation is essential to the efficiency of our business.nnIt may take a while but the key is using it to be productive for’}]

The difference between the first and second prompts is the last word given. In the first, we have given ‘is’, and in the second, we have given ‘was’. From the output below, it is very clear how efficiently the model generates according to the prompts.

prompt = “AI based text generation was”
text_generator(prompt, max_length=30, num_return_sequences=5)
Output:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[{‘generated_text’: ‘AI based text generation was done on a fully automated system (i.e., that uses the current technology, the first machine to run the command,’},
{‘generated_text’: ‘AI based text generation was launched in 2010 in Turkey.nnThe first of this type of font will be available to the public from the next month’},
{‘generated_text’: ‘AI based text generation was born in 1995 on the idea that we should build a set of models that would automatically detect and translate the text in real time’},
{‘generated_text’: ‘AI based text generation was a major problem. Text is generated in both high performance and non-proprietary ways. As in every other field,’},
{‘generated_text’: ‘AI based text generation was built. However, we never considered a lot of ways to address the many potential issues with what we created with Text.org’}]

The third prompt is about ‘using GitHub,’ and the results can be seen below. It is also evident that the model does not repeat the generated sentences and produces different ones.

prompt = “I am using Github to”
text_generator(prompt, max_length=30, num_return_sequences=5)
Output:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[{‘generated_text’: “I am using Github to get the job done so I’ll have to follow through in this tutorial. I created all the tutorials on youtube, but I”},
{‘generated_text’: ‘I am using Github to start a new project, and I am wondering if you can help me make some modifications and give the repository a try.n’},
{‘generated_text’: ‘I am using Github to start a blog, and this is how I create what I call a monthly blog.nnIf you run into anything,’},
{‘generated_text’: ‘I am using Github to serve the rest of the system.nnCreate an executable:nnimport os import os.path as path path =’},
{‘generated_text’: “I am using Github to post updates to my work. I don’t understand why people want to check in on my work and delete it while it is”}]

5. Named Entity Recognition

The next important application in the line of transformers is to identify entities in a given text. The entities can be a person, an organization, a location, etc.

In the first step, instantiate pipeline with ner as the intended task. The default model is bert-large-cased-finetuned-conll03-english and the corresponding model weights are downloaded to named_entity_recognition.

from transformers import pipeline
named_entity_recognition = pipeline(“ner”, aggregation_strategy = “simple”)

Output:
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%
998/998 [00:00<00:00, 23.5kB/s]
Downloading: 100%
1.33G/1.33G [00:42<00:00, 41.7MB/s]
Downloading: 100%
60.0/60.0 [00:00<00:00, 1.07kB/s]
Downloading: 100%
213k/213k [00:00<00:00, 1.85MB/s]

Now, we load input_text with a set of sentences to extract the entities.

input_text = “””GitHub Inc is an Internet hosting service for software
development and version control using Git. It provides the distributed
version control of Git plus access control, bug tracking, software feature
requests, task management, continuous integration, and wikis for every project.
Headquartered in California, it has been a subsidiary of Microsoft since 2018.
It is commonly used to host open source software development projects.
As of June 2022, GitHub reported having over 83 million developers and
more than 200 million repositories, including at least 28 million public
repositories. It is the largest source code host as of November 2021.
“””

To the model (named_entity_recognition), we give the input_text and print the result (tags). We can see that GitHub Inc, Microsoft, GitHub are named as an Organization, California as a Location, and Internet, Git as Miscellaneous. The output also mentions the start and end of the word, and the score for each classification.

ctags = named_entity_recognition(input_text)
print(tags)

Output:
[{‘entity_group’: ‘ORG’,
‘score’: 0.99605906,
‘word’: ‘GitHub Inc’,
‘start’: 0,
‘end’: 10},
{‘entity_group’: ‘MISC’,
‘score’: 0.91944593,
‘word’: ‘Internet’,
‘start’: 17,
‘end’: 25},
{‘entity_group’: ‘MISC’,
‘score’: 0.9152951,
‘word’: ‘Git’,
‘start’: 93,
‘end’: 96},
{‘entity_group’: ‘MISC’,
‘score’: 0.93599296,
‘word’: ‘Git’,
‘start’: 145,
‘end’: 148},
{‘entity_group’: ‘LOC’,
‘score’: 0.9990779,
‘word’: ‘California’,
‘start’: 301,
‘end’: 311},
{‘entity_group’: ‘ORG’,
‘score’: 0.9995492,
‘word’: ‘Microsoft’,
‘start’: 341,
‘end’: 350},
{‘entity_group’: ‘ORG’,
‘score’: 0.9892499,
‘word’: ‘GitHub’,
‘start’: 452,
‘end’: 458}]

To see the result in a more readable form, we can install and import pandas library and create a DataFrame of the output (tags). Printing it will give us a table of tags, as shown below.

import pandas as pd
tags = named_entity_recognition(input_text)
pd.DataFrame(tags)
print(tags)

Output:
entity_group score word start end
0 ORG 0.996059 GitHub Inc 0 10
1 MISC 0.919446 Internet 17 25
2 MISC 0.915295 Git 93 96
3 MISC 0.935993 Git 145 148
4 LOC 0.999078 California 301 311
5 ORG 0.999549 Microsoft 341 350
6 ORG 0.989250 GitHub 452 458

Summary

In this article, we went through the introduction to transformers and their aid in transfer learning. We also saw the simple steps for utilizing a model from Hugging Face. We went through various applications like sentiment analysis, summarization, question answering, text generating, and named entity generation. There are more applications to explore, which will be discussed in Part 2 of this article.

Having known all these possibilities of fascinating applications, select a task of your interest and try out different models and inputs for that task.

Proceed and Succeed !!!!

References

Please find more articles related to NLP in this page.

Thank you!!

A Journey into the Fabulous Applications of Transformers — Part 1 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Author: Jonathan Kelly