International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1870
Survey on Grammar Checking and Correction using Deep Learning for
Indian Languages
Neethu S Kumar
1
, Supriya L P
2
1
MTech, Dept. of Computer Science & Engineering, Sree Buddha College of Engineering, Pathanamthitta, Kerala
2
Assistant Professor, Dept. of computer science & Engineering, Sree Buddha College of Engineering,
Pathanamthitta, Kerala
-------------------------------------------------------------------------***------------------------------------------------------------------------
Abstract - A grammar checker is one of the basic Natural
Language Processing tools for any language. The grammar
checker is widely used for detecting and correcting the
sentence during a writing process. There are different kinds
of grammar checkers. This paper describes a survey on
grammar checker using deep learning for Indian languages.
Grammar checking is a fundamental task for the writing
process. The grammar consists of many rules including past,
present, and future. There are different grammar checker for
different languages which aims to improve the accuracy for
minimum error. This survey concludes with different
features of existing grammar checking.
Key Words: Natural Language Processing, Grammar,
Grammar checker, Rule-based, Statistical, Hybrid
1. INTRODUCTION
Language is a communication between human beings.
Human natural language can be defined as an
interchangeability process between human beings. Grammar
is elements in language and it contains sets of rules. Words are
the basic grammatical units and these grammatical units
combine together to form sentences. These sentences are
formed by using some grammar rules. Grammar is a set of
rules and these rules are used to form sentences. There are
many grammatical errors occurring during the writing
process.
One of the main objectives of communication is to
share information. This information can be defined in
written-form or vocal-form. The most important in
information content form is the validity of sentences in the
language. Morphemes, phonemes, words, phrases, clauses,
sentences, vocabulary and grammar are the blocks of
language. All valid sentences of a language must follow the
rules of that language. A Sentence is the combination of
different words. Sentences with various types of errors are
written by language learners of different backgrounds.
Sentences can be classified into mainly three. First, simple
sentences, which is a collection of one or more arguments.
This sentence contains clause and mostly verb root and does
not contain question words and negation. Second, complex
sentences, which contain two clauses, having interdependence
between main and dependent or subordinate clause. Third,
compound sentences, which contain multiple clauses.
Natural Language Processing is the one the subfield of
artificial intelligence, which is the interaction between the
computer and human languages. Most of the natural language
processing based on handwritten rules. Grammar checking is
one of the most common technology of natural language
processing. There are many grammar checkers are used for
different languages. The Grammar checker is a program
which is used to check whether the sentence is
grammatically correct or not. Many different types of
grammar checker based on different approaches. They are
Rule-based checking, statistics-based checking and hybrid
checking. Most of the existing grammar checking are style
checking, checking uncommon words and complicated
sentence structure.
1.1 Statistical Grammar Checker
In statistical grammar checker, which use an annotated
corpus. The annotated corpus is maintained from different
journals, magazines or documents. It ensures that the
correctness of sentences by checking the input sentences
with corpus. Here, there are mainly two ways to check the
input sentence. First input text is directly checked with
corpus and it check whether the sentence is matched with
input text and it is tagged as grammatically errors otherwise
checked the sentence is correct or incorrect. The second way
is, the maintained corpus are generating some rules and the
input sentence is checked by using these rules. When the
corpus is maintained or add new data there is no update for
the rules. This approaches has some disadvantage is that it is
difficult to find the error in sentence and recognize the error
in the system.
1.2 Rule Based Grammar Checking
Most commonly used approaches is rule-based grammar
checking. In rule-based grammar checking, the input
sentence is checked by rules formed from the corpus. But in
statistical approach, rules are manually generated. In the
rule-based approach, the rules are easy to configure and also
to modify these rules. One of the significant advantages of
this approach is to handle the rules by one who does not
have programming language and it also provides a detailed
error message. The main characteristics of this approach are
to handle all features of language and sentences also need to
be completed and also it can easily handle the input
sentence.