Natural Language Processing
The emergence of deep learning technologies has revolutionized many areas of Natural Language Processing (NLP) such as sentiment analysis, question answering, text classification, named entity recognition, etc. In sentiment analysis, a piece of text is given to the computer as input and the computer needs to tell whether the text contains positive or negative sentiments.
In case of question answering, a pair of a question and a text containing the answer to the question is given as input to the computer, and the job of the computer is to extract the answer from the input text. For text classification, the input text is classified by the computer into one of the predefined classes. The named entity recognition task extracts the names of the person, location or company, etc. from the given input text. So how the text is fed into the computer? As you may know, that computer works on numbers and cannot process the text directly. So there needs to be a mechanism to convert input text into numbers. Typically, a vector of numbers is assigned to each word in the text.
For example, consider the following text:
Sentence A: I went to the bank to withdraw money.
Each word in sentence A is represented with a numeric vector before it is fed into the computer. For now, don’t worry about how it is converted into numeric vectors.
Now consider another sentence B given below:
Sentence B: I went to the bank to withdraw money, but the cashier told me that my account is empty. I went to the river bank and cried there.
Now compare the word “bank” in sentence A and sentence B. What do you think, what it represents? Yes, you are right. In sentence A, the word “bank” means a financial institution. However, sentence B contains the word “bank” two times. The first time it refers to the financial institution, but the second time, it refers to the edge of the river. So how, the word “bank” is represented as a vector of numbers? The answer is, there are two ways to represent the word:
Non-contextual representations or embedding: In this case, a fixed vector is used for the word “bank” whether it is a financial institution or the edge of the river. So what is the problem with this representation? Yes, you are right, if we represent a word having different meanings in the same fashion, the output of our task such as text classification, question answering may be incorrect.
Contextual word representations or embedding: In this case, each word is assigned a numeric vector based on the surrounding words instead of a fixed vector. So the word “bank” will have two different numeric vectors in the case of riverbank and bank as a finical institution. This option seems logical.
Please note that we will use the word representations and embedding interchangeably i.e. they mean the same thing. Currently, we have shown an example of one word “bank”. But there can be many same words having different meanings.
So how these representations are learned? Well, there can be many ways to learn these word representations but one way is called language modeling. A language model tries to predict the next word when given some previous words. These previous words are called context words. For example:
Sentence C: I want a glass of ———-.
So what should be in the blanks of sentence C? Hmm, you are trying to predict this word. You are a language model. Lol. Well, these words can:
I want a glass of milk.
I want a glass of water.
And so on. Once a language model can predict the next words correctly when given with previous or context words, we hope that it has somehow learned the good representations of all the words. So far, it all has been theoretical. In future posts, we will be discussing the technical details of the language model, its limitations, and remedies.
You may also like to read: What are Neural Networks? Types of Neural Networks