http://ssrn.com/link/ICAESMT-2019.html=xyz Information Systems &eBusiness Network (ISN) Automatic Question Generation: A Systematic Review Sonam Soni a , Praveen Kumar b , Amal Saha c a Ph.D Scholar, Amity University, Uttar Pradesh, India, sonamsoni.nmims@gmail.com b Amity University, Uttar Pradesh, India, pkumar3@amity.edu c SGT University, Uttar Pradesh, India, amal.k.saha@gmail.com Keywords: Automatic Question Generation Natural Language Processing Natural Language Generator Natural language Understanding A B S T R A C T Today's educational systems need an efficient tool to perform competently assessment of students on their major concepts they learnt from study material. Preparing a set of questions for assessment can be time consuming for teachers while getting questions from external sources like assessment books or question bank might not be relevant to content studied by students. Automatic Question Generation (AQG) is the technique for generating a right set of questions from a content, which can be text. Automatic question generation (QG) is a very important yet challenging problem in NLP. It is defined as the task of generating syntactically sound, semantically correct and relevant questions from several input formats like text, a structured database or a knowledge base. Question generation can be naturally applied in many domains such as MOOC, automated help systems, search engines, chatbot systems (e.g. for customer interaction), and healthcare for analyzing mental health. AQG has the got the immense attention from researchers in a field of computational linguistics. The review paper focuses on the recants on-going research on NLP for generating automatic questions from the text through various methods. Introduction Natural language processing is used to make machine to understand the language, and what is said to it. The process breaks the content in parts and comprehend the meaning of the language, determine appropriate actions, and then the process revert back to the user in the language they understands. Natural Language Understanding is a subset of NLP which can understand and convert the unstructured inputs into a structured form. It can be considered as the more important aspect that deals with much narrower and an equally important approach to handle the unstructured input. While humans are able to easily handle mispronunciations, changed words, contractions, colloquialisms, and other quirks, machines are less adept at handling volatile inputs. Natural Language Generation, simply put, is what happens when computers write language. NLG processes turn prearranged data into text. The main objective of the system that is AQGS, which takes the input as a text using Natural Language processing, and then the output will be various types of Questions. The system aims to generate questions that extracts the content (Text, Paragraph, sentences) information which the student has learned from reading the text. Distributed representation of words with Word2Vec model [1] was a milestone in the development of sophisticated algorithms in text analytics and paved the way for development of many solutions in the domain of natural language processing, e.g., information retrieval, topic modelling, web search, spam filtering, sentiment analysis and many other types of text classification. But this model is based on bag of words with no reference to sequence of occurrence of words and the semantic relationship, was insufficient for analysing paragraph or document as a whole and hence Paragraph Vector model was introduced by Thomas Mikolov [2]. These representations of text, combined with deep neural network, advanced the field of natural language processing in a big way. These approaches in fact-built foundation for various computational linguistics or NLP use cases. Previous approaches based on hidden Markov model (HMM) did not achieve the level of success we have seen in recent past with the approaches mentioned above. One important domain of NLP is question answering system. IBM Wat son beating human champions in “Jeopardy!” game was the most visible implementation of such a system, although the approaches used by IBM team are not necessarily the ones highlighted above and IBM achieved this feat before many NLP approaches mentioned above came in public domain as research publications. One thing is obvious Watson brought the question answering system to limelight and there have been spurt of research activities in this field. Facebook Research team developed and open-sourced a question answering framework called DrQA [3] which can handle open-domain system where content for questions and answer come from open-open- source material like Wikipedia. With evolution and enhancement of closed-source solution like Watson and emergence of open-source QA frameworks like DrQA, it is still a challenge to use the models on various content types like text, audio, static image, video, etc and natural languages and multi-lingual use cases, accents, etc. Dealing with them would require insertion of new NLP algorithms, data curation and training approaches. Electronic copy available at: https://ssrn.com/abstract=3403926