Natural Language Processing


School of Engineering, Cochin University of Science and Technology, Kochi, Kerala 682022, India

'Corresponding author. E-mail: This email address is being protected from spam bots, you need Javascript enabled to view it


Understanding and processing human language has always been a complicated task in computational theory. Natural language processing, a combination of artificial intelligence and computational linguistics, employs computational techniques to understand the structure of human language. The challenges in the field are increasing day by day with the explosive growth of text contents on the Internet, varying text forms used in social media, and handling conversational complexities associated with intelligent devices. Natural language processing ranges from analyzing natural language by tokenizing and parsing techniques to resolving ambiguities and co-references in language. This chapter explains the basics of natural language processing that includes text processing techniques, parsing, semantic analysis, and the latest trend in deep learning models, which promises excellent improvement in natural language processing tasks, with simple examples that give more comprehension about the concepts in brief. The chapter also includes a comparative study of long short-term memory and gated recurrent unit for the sequence to sequence modeling with step-by-step implementation details of opinion summarization.


Artificial Intelligence (AI) and Neural Network (NN) are two essential technologies in the field of computer science. Both technologies try to create intelligent systems. Human intelligence depends on brain structure, and brain plays a vital role in thinking and formulating decisions. Machine learning, fuzzy logic, etc., can be used to make intelligent machines artificially. However, the most effective technique that creates an intelligent system is a Neural Network that helps machines to think like humans. Therefore, the main difference between AI and NN is that NN is the stepping stone to AI. A Neural Net can solve real-world scenarios by training them in similar situations. The main drawback of NNs is that they cannot handle a new scenario that is not present in the training set. Cognitive computing is a new technology that enables the machines to think like humans, whereas AI tries to create intelligent machines. Cognitive computing can extract information from a large amount of data and can leant continuously and instantly.

For making a machine think like a human, it should have all the capabilities that a human has. Learning from experiences, sensory perceptions, deduction, processing information, and memory are the essential features needed to think like a human.

  • • The system should be capable of learning new data.
  • • A cognitive system must have a sensory perception depending on the application, and it should be like a human-human interaction, for example, voice assistants.
  • • The system should have the ability to handle a new situation efficiently.
  • • The system should be able to identify the context based on the current environment.

Cognitive computing is a technology that helps in mimicking the human thought process. Most of the people believe that cognitive computing is a standalone technology. However, cognitive computing is a combination of multiple technologies. The key technologies that fuel cognitive computing are machine learning, NLP, machine reasoning, speech recognition, object recognition or computer vision, dialog systems, and human-computer interaction.

NLP plays a vital role in building cognitive systems as natural language understanding and natural language generation (NLG) have been an inevitable human feature. NLP, a combination of AI and computational linguistics.

employs computational techniques for understanding the structure of human language. NLP ranges from analyzing natural language by tokenizing and parsing techniques to resolving ambiguities and coreferences in the language. With recent advances in NLP, computers can understand the natural language and respond. NLG plays a vital role in generating responses artificially. NLG by analyzing images, videos, and other nontextual data is still a vast research area. Current trends in NLP include sentiment analysis, dialog systems, machine translation, opinion summarization, and spam detection. Recent advances in deep learning models and methods promise a significant improvement in NLP tasks with a better understanding of cognitive science.

Both NLP and cognitive computing rely on each other. NLP aids cognitive computing and cognitive computing aids NLP. NLP itself can be seen as a cognitive technology because it uses sensory perceptions such as audio and visual perceptions as the primary step for an NLP task. Figure 14.1 shows the cognitive approach to NLP.

Cognitive approach to NLP

FIGURE 14.1 Cognitive approach to NLP.


The journey of NLP began in the 1950s when signal processing scientists started processing speech signals. Machine translations works started during that period. Some of the landmark works in NLP are listed as follows.

1950 Automatic translation from Russian to English using the IBM 701 mainframe computer. They used statistics and grammatical mles for translation.

  • 1956 McCarthy coined the term “Artificial Intelligence” in Dartmouth Conference.
  • 1957 Chomsky's language models made a remarkable change in the field of NLP
  • 1958 McCarthy introduced LISP programming language
  • 1963 Giuliano introduced Automatic language processing concept
  • 1964 ELIZA (NLP computer program) was developed at MIT by Joseph Weizenbaum
  • 1966 Halted research on machine translation as it was very costlier than human translation
  • 1970 SHRDLU (NL understanding computer) project for rearranging blocks by Тепу Winograd was able to understand sentences like “Put the blue cube on the top of the red cube.”
  • 1975 Parsing program for automatic text to speech
  • 1979 А-gram concept
  • 1981 Knowledge-based machine translation
  • 1982 Concept of the chatbot was created, and the project Jabberwacky began
  • 1985-1990 Natural language processing using knowledge base 1990 Speech recognition using HMM
  • 1992 Neural net for knowledge extraction
  • 1995 Use of linguistic patterns for knowledge-based information extraction
  • 1998 Classification of text documents
  • 1999 New methods for syntax and semantic analysis

By the beginning of the 21 st century NLP research becomes more advanced with the evolution of modem computers, increased computation power, and memory. The new technologies such as machine learning, NN, probabilistic methods, and statistics were used to develop NLP applications with cognitive abilities. Apples’ SIRI, IBM Watson are examples of the system with cognitive skills. NLP research is continuing to find more efficient methods for NLP, natural language understanding, and NLG.


For understanding NLP, one needs to know the basics of natural language, that is, linguistics and the steps involved in processing it.


Phonology is the study of speech sounds used in a particular language. Every alphabet has a sound associated with it, and a word is pronounced by combining those sounds. Word pronunciation can be explained through phonetics.

For example, in the English language, “read” in the present tense (reed) and “read” in the past tense (red) have different meanings. MORPHOLOGY

Morphology is the study of structures of words/formation of words. A moipheme is the smallest individual unit of language that has a specific meaning. Morphemes can be words, prefixes, or even suffixes. For example, the word “unfairness” means a lack of justice or inequality. Morphemes out of this would be:

Un (not) - prefix

fan (treating people equally) - the root word

ness (being in a state) - suffix SYNTAX

The syntax is nothing but the structure of language. Syntax analysis is the study of the structural relationships among words in a sentence or how words are grouped to form sentences. Eveiy language follows a rule for creating meaningful sentences. Subject, verb, object, parts of speech (POS), etc., help in the formation of meaningful sentences.

In English linguistics, subject-verb-object is a sentence structure where the subject comes first, the verb second, and the object thud. The POS categorize words according to then usage. For example, noun is used to represent the name of a place, person, or thing. SEMANTICS

Semantics is the study of the meaning of words in a sentence and how these words are combined to form meaningful sentences. Lexical semantics analyze the relation between words such as synonyms, hypemyms, and others. Semantics try to interpret the meaning of a sentence by combining and finding a relationship between word meanings.

For example, consider a sentence extracted from a paragraph. “That company is facing a huge financial crisis now, and in May it may vaiy.” hi this case, “May” represents a month and “may” represent a word. This type of words can confuse the machine in finding the proper meaning of a sentence. PRAGMATICS

Pragmatics shidies the situational use of language sentences. It is slightly different from semantics. Semantics tiy to interpret the meaning of a sentence by combining the meaning of words, whereas pragmatics tiy to find the meaning of a sentence based on the situation.

Consider the proverb, “Don't judge a book by its cover.” Its semantic meaning is the same as the word meaning in the sentence, but its pragmatic meaning is “Don't judge someone or something by appearance alone.” DISCOURSE

Discourse is a group of sentences, and it shidies or finds the actual meaning of the context by connecting component sentences.

“Radha took a book from the library. Then she went to the coffee shop, and she left the book there.”

The above context leads to more than one inferences, and it can answer different questions. Consider the question, “Where is the book now?” The answer is, “Book is in the coffee shop.”


For a given context in natural language, there are various stages of analysis for that sentence/context. The study of natural language for the extraction of useful information is called NLP. NLP tasks are explained in two ways: one is a theoretical approach, and the other is an engineering approach. The theoretical method describes the stages of NLP for a given problem conceptually, whereas the engineering approach describes how a computer accomplish NLP.

Theoretical and engineering approach for NLP

FIGURE 14.2 Theoretical and engineering approach for NLP.

< Prev   CONTENTS   Source   Next >