ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 1, Issue 8, October 2012
365
All Rights Reserved © 2012 IJARCET
Evaluating NIST Metric for English to Hindi Language Using ManTra Machine
Translation Engine
Neeraj Tomer
1
Deepa Sinha
2
AIM & ACT Department of Mathematics
Banasthali University Banasthali South Asian University
Jaipur, India New Delhi, India
Abstract: Evaluation of MT is required for Indian languages because the same MT is not works in Indian language as in European languages due
to the language structure. So, there is a great need to develop appropriate evaluation metric for the Indian language MT. The present research
work aims at studying the Evaluation of Machine Translation Evaluation’s NIST metric for English to Hindi for tourism domain using the output
of ManTra, a translation system. Machine Translation Evaluation has been widely recognized by the Machine Translation community. The main
objective of MT is to break the language barrier in a multilingual nation like India.
Keywords: MTE- Machine Translation Evaluation, MT – Machine Translation, EILMT –Evaluation of Indian Language Machine Translation,
ManTra – MAchiNe Assisted TRAnslation Technology, Tr – Tourism
INTRODUCTION
Indian languages are highly inflectional, with a rich
morphology, relatively free word order, and default sentence
structure as Subject-Object-Verb. In addition, there are many
stylistic differences. So the evaluation of MT is required for
Indian languages because the same MT is not works in Indian
language as in European languages. The same tools are not
used directly because of the language structure. So, there is a
great need to develop appropriate evaluation metric for the
Indian language MT.
English is understood by less than 3% of Indian
population. Hindi, which is official language of the country, is
used by more than 400 million people. MT assumes a much
greater significance in breaking the language barrier within the
country’s sociological structure. The main objective of MT is
to break the language barrier in a multilingual nation like
India. English is a highly positional language with rudimentary
morphology, and default sentence structure as Subject-Verb-
Object. The present research work aims at studying the
“Evaluation of Machine Translation Evaluation’s NIST Metric
for English to Hindi” for tourism domain. The present research
work is the study of statistical evaluation of machine
translation evaluation for English to Hindi. The research aims
to study the correlation between automatic and human
assessment of MT quality for English to Hindi. The main goal
of our experiment is to determine how well a variety of
automatic evaluation metric correlated with human judgment.
In the present work we propose to work with corpora in
the tourism domain and limit the study to English – Hindi
language pair. It may be assumed that the inferences drawn from
the results will be largely applicable to translation for English to
other Indian Languages. Our test data consisted of a set of
English sentences that have been translated from expert and
non-expert translators. The English source sentences were
randomly selected from the corpus of tourism domain. These
sentences are taken randomly from the different resources like
websites, pamphlets etc. Each output sentence was score by
Hindi speaking human evaluators who were also familiar with
English. It may be assumed that the inferences drawn from the
results will be largely applicable to translation for English to other
Indian Languages, as assumption which will have to be tested for
validity. We intend to be consider the following MT engine in our
study-
ManTra: C-DAC Pune has developed a translation system
called ManTra. The work in ManTra has to be viewed in its
potentiality of translating the bulk of texts produced in daily
official activities. The system is facilitated with pre-processing
and post-processing tools, which enables the user to overcome
the problems/errors with minimum effort. The strategy used
for translation is: NOT Word to Word; NOR Rule to Rule;
BUT Lexical Tree to Lexical Tree.
OBJECTIVE
The main goal of this work is to determine how well a variety
of automatic evaluation metrics correlated with human scores.
The other specific objectives of the present work are as
follows.
1. To design and develop the parallel corpora for deployment
in automatic evaluation of English to Hindi machine
translation systems.
2. Assessing how good the existing automatic evaluation
metrics NIST, will be as MT evaluating strategy for
evaluation of Indian language machine translation systems
by comparing the results obtained by this with human
evaluator’s scores by correlation study.
3. To study the statistical significance of the evaluation
results as above, in particular the effect of-
size of corpus
sample size variations
increase in number of reference translations
Creation of parallel corpora: Corpus quality plays a
significant role in automatic evaluation. Automatic metrics can
be expected to correlate very highly with human judgments
only if the reference texts used are of high quality, or rather,
can be expected to be judged high quality by the human
evaluators. The procedure for creation of parallel corpora is as
under:
1. Collect English corpus from the domain from various
resources.
2. Generate multiple references (we limit it to three) for
each sentence by getting the source sentence
translated by different expert translators.