International Journal of Advanced and Applied Sciences, 3(9) 2016, Pages: 59-66 Contents lists available at Science-Gate International Journal of Advanced and Applied Sciences Journal homepage: http://www.science-gate.com/IJAAS.html 59 Artificial intelligence and natural language processing: the Arabic corpora in online translation software Mohammed Abdulmalik Ali * Department of English, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia ARTICLE INFO ABSTRACT Article history: Received 25 May 2016 Received in revised form 28 August 2016 Accepted 20 September 2016 It is ironical to note that worldwide the Internet content in the Arabic language is mere 1%, whereas 5% of the world population speaks Arabic. This speaks of the disproportionate presence of on-line content of Arabic language as compared to other languages which may be due to many reasons including a lack of experts in the field of the Arabic language. This research study will investigate the impact of such Machine Translation (MT) software and TM tools that are widely used by the Arab community for their academic and business purposes. The study aims at finding whether it is possible to bring a paradigm shift from Arabic Localization to Arabic Globalization; hence, facilitating the usage of NLP techniques in the human interface with the computer. For this study; a few machine translation software (e.g. SYSTRAN, IBM Watson) shall be studied for their content and applications, to determine their usage without human intervention and retaining the meaning of the original text. Keywords: Arabic corpora Online content Translation Software © 2016 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). 1. Introduction * Researchers have known Natural Language Processing (NLP) as that branch of Artificial Intelligence (AI) that deals with analyzing a language that is used by a human being to interface with a computer. A great challenge that man has faced in such an interface is to teach a computer the language that a man can learn, understand and interact in, which in the current context, is the Arabic language. Being the largest living Semitic language, official language of 23 countries, spoken by over 360 million people worldwide (The Arab world population is estimated to 369.8 million people (2013). The Arab region maps from Morocco in North Africa to Dubai in the Persian Gulf), Arabic language has ironically less than 1% of worldwide Internet content when 5% of the world population speaks Arabic. This speaks of the disproportionate presence of on-line content of Arabic language as compared to other languages. The reason given by NLP experts (Ali and Khaled, 2009; Habash, 2010; Hijjawi and Elsheikh, 2015; Huang, 2015) with regard to analyzing the use of the Arabic Diglossia is that Arabic has two forms * Corresponding Author. Email Address: mh.ali@psau.edu.sa (M. A. Ali) https://doi.org/10.21833/ijaas.2016.09.010 2313-626X/© 2016 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) existing concurrently. The first is the Modern Standard Arabic (MSA) which is widely used in formal situations like formal speeches, government and official operations, product manuals, and news media and it is perceived as the Dzlanguage of the minddz in contrast with the second form known as Dialectal Arabic (DA). It is the informal private language, predominantly found as spoken vernaculars with no written standards, and perceived as Dzlanguage of the heartdzǯ although the Arab speakers perceive the use of dialects as a Dzdeteriorateddz form of Classical Arabic ȋ(uang, 2015), a much debatable issue and outside the scope of this study. This research paper will be citing a few studies that have principally addressed to these challenges and shall also investigate the impact of such Machine Translation (MT) software that are widely used by the Arab community for their academic and business purposes. This study will also cite examples of discrepancies found in these software and search engines. The main objective of this study is to find whether it is possible to bring a paradigm shift from Arabic Localization to Arabic Globalization in order to facilitate the use of NLP techniques or even formulate and modify the existing Arabic corpora for better understanding of the language. The article shall also discuss frequent colloquialism (e.g. Arab chat alphabet known as Moaarab or Arabizi) as found on social media platforms like Facebook and Twitter not withstanding spelling errors,