Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology, 7 (4.44) (2018) 156-160
International Journal of Engineering & Technology
Website: www.sciencepubco.com/index.php/IJET
Research paper
Open Problems in Indonesian Automatic Essay Scoring System
Faisal Rahutomo
1
*, Trisna Ari Roshinta
2
, Erfan Rohadi
3
, Indrazno Siradjuddin
4
, Rudy Ariyanto
5
,
Awan Setiawan
6
, Supriatna Adhisuwignjo
7
1,2,3,4,5,6,7
State Polytechnic of Malang
*Corresponding author E-mail: faisal@polinema.ac.id
Abstract
This paper presents open problems in Indonesian Scoring System. The previous study exposes the comparison of several similarity metrics
on automated essay scoring in Indonesian. The metrics are Cosine Similarity, Euclidean Distance, and Jaccard. The data being used in the
research are about 2,000 texts. This data are obtained from 50 students who answered 40 questions on politics, sports, lifestyle, and tech-
nology. The study also evaluates the stemming approach for the system performance. The difference between all methods between using
stemming or not is around 4-9%. The results show Jaccard is the best metric both for the system with stemming or not. Jaccard method
with stemming has the percentage error lowest than the others. The politic category has the highest average similarity score than lifestyle,
sport, and technology. The percentage error of Jaccard with stemming is 52.31%, Cosine Similarity is 59.49%, and Euclidean Distance is
332.90%. In addition, Jaccard without stemming is also the best than the others. The percentage error without stemming of Jaccard is
56.05%, Cosine Similarity is 57.99%, and Euclidean Distance is 339.41%. However, this percentage error is high enough to be used for a
functional essay grading system. The percentage errors are relatively high, more than 50%. Therefore this paper explores several ideas of
open problems in this issue. The openly available dataset can be used to develop better approaches than the standard similarity metrics.
The approaches expose are ranging from feature extraction, similarity metrics, learning algorithm, environment implementation, and per-
formance evaluation.
Keywords: Indonesian, Natural language processing, Automatic essay scoring system, Open problems.
1. Introduction
Every learning process requires an evaluation to measure the level
of students’ understanding. There are many types of evaluations in-
clude multiple choice question, short question, and essay question.
Some studies have revealed that essay question is better than others
if the student’s knowledge is evaluated thoroughly [1]. But, the
problem arises is time-consuming of the rating process. The teacher
should read and evaluate sentence by sentence of student answer.
Nowadays, many information technologies are developed to auto-
mate human activities. In the education issue, the developing
example is essay grading. Researchers have done research on auto-
mated essays scoring (AES) since sixties years last century [2].
There are so many advantages that can be obtained in automated
grading rather than in conventional grading. It is reported that
teachers in Britain are spending about 30% their time in scoring
student’s answers and it loses about 30 billion pounds per year [3].
So, there will be many benefits from the application of the
automated essay scoring system.
The application of automated essay scoring system has been devel-
oped with many different methods being used. However, there is no
study indicating which method is better in automated essay scoring,
especially in Indonesian. The previous research [4] reveals the
average errors of some methods which are commonly used in auto-
mated essay scoring in Indonesian. The average errors of each
method are calculated with comparing the scores from human raters
and scores from the system. The methods are Cosine Similarity, Eu-
clidean Distance and Jaccard. The results show Jaccard is the best
approach, but the average error is still high, more than 50%.
Therefore this paper exposes several ideas that can be explored
further toward this issue. With the benefit of the openly available
dataset in http://dx.doi.org/10.17632/6gp8m72s9p.1 [5]. Several
evaluations can be done by changing the parameters, such as feature
extraction, similarity metric, learning algorithm, environment
implementation, and performance evaluation.
This paper presentation is divided into several chapters. Chapter 1
describes the introduction. Then, Chapter 2 exposes the summary
of the previous study in English, because Roshinta and Rahutomo
report [4] are written in Indonesian. Chapter 3 explores further ideas
and open problems toward this issue. Finally, Chapter 4 concludes
this paper.
2. Indonesian essay scoring system
Roshinta and Rahutomo [4] propose a web-based automated essay
scoring system for Indonesian. The research also develops a dataset
for performance evaluation purpose [5]. The study consists of
several phases. First, developing the dataset. Inside the dataset are
questioned texts with corresponding answer texts. The questions are
classified into four categories: lifestyle, politics, sport, and
technology. Second, develop the web-based automated essay
scoring system. Third, student respondents are asked to answer the
questions through web-based application system. Then, the system
calculates the score with 3 methods. Fourth, the students’ answers
are scored manually by 3 lecturer respondents. The final score is
defined as the average score of the three respondents then served as
the gold standard. Finally, the calculation of the average percentage
error between manual scores and the system scores of each method.