INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 5, ISSUE 12, DECEMBER 2016 ISSN 2277-8616 64 IJSTR©2016 www.ijstr.org A Methodology In Processing Descriptive Analytics Using MMDA Traffic Update Tweets, Tokenization And Classification Tree In Discovering Knowledge Tristan Jay P. Calaguas, Menchita F. Dumlao Abstract: Traffic on National Capital Region of the Philippines is going as one of many problems facing by the local government and Filipino citizen who are residing in Metro Manila. In addition, a Filipino citizen that is working in Metro Manila is experiencing a waste of Twenty Eight Thousand hours in traffic which results unproductivity. Due to traffic that causes long commutes it take away an individual from exercise activities that results fatigue in their health. In relation with this, due to lack of exercise that causing by the traffic, each year, One Hundred Seventy Thousand Filipinos die from cardiovascular diseases up from Eighty Five Thousand more than Twenty years ago, according to 2009 study by the Department of Health (DOH). Population increase is one of many causes of traffic in Metro Manila. As population is growing, the more car riders and commuters volume will be in the road including delivery trucks, Pedi cabs, jeeps, and provincial buses that signify that there is a high employment rate in the country that causes traffic. However, to sustain the public needs, MMDA is the government agency that provides public services to Filipino citizens through providing updated public traffic information. For past years, MMDA used Telephony lines and Television Broadcasting for traffic information dissemination, which is very costly in maintenance that made them to adopt Twitter to post Traffic updates and advisories to the public .Since, this government agency uses Twitter in disseminating information through posting tweet, there is a need for a methodology on how these tweets will analyze so that citizens will have an insight in decision making to avoid specific time of traffic in metro manila. From this condition, the researcher will adopt the use of MMDA tweets as the primary data source and apply the CRISP as the knowledge discovery standard processes that to be used in building methodology for descriptive analytics. In this experimental research several processes were used to convert the semi structured MMDA tweets into structured data matrix. SQL was used for storing, retrieving and pattern matching, while PHP string functions were used to tokenize the tweet and transform it into array so that the tokens can store in database using iterative structure. After loading all token to its specific table we abled to have a data matrix that comprised of time, routed roads, traffic status and day information that was used in data mining to discover knowledge. Lastly we used J48 Classification Algorithm to classify the time usually the traffic happens in many routed roads from NCR. As the result we discovered that from Eight O‘clock to Nine Forty One in the morning the commuters are experiencing a traffic and from One O ‘Clock in the afternoon to Eight O‘Clock in the evening the commuters are also experiencing a traffic in C5 North Bound to South Bound and Edsa North Bound to South Bound every Tuesday and Friday with the accuracy of 75.72%. Index Terms: tokenization, classification, tweets, traffic, methodology, knowledge discovery, update traffic ———————————————————— 1 INTRODUCTION Traffic on National Capital Region of the Philippines is going as one of many problems facing by the local government and Filipino citizen who are residing in Metro Manila. In addition, a Filipino citizen that is working in Metro Manila is experiencing a waste of Twenty Eight Thousand hours in traffic which results unproductivity [1]. Due to traffic that causes long commutes [2] can take away from exercise times that results fatigue to individual [3]. In relation with this, due to lack of exercise that causing by the traffic, each year, One Hundred Seventy Thousand Filipinos die from cardiovascular diseases up from Eighty Five Thousand more than Twenty years ago, according to 2009 study by the Department of Health (DOH) [4]. Population increase is one of many causes of traffic in Metro Manila. As population is growing, the more car riders and commuters volume will be in the road including delivery trucks, Pedi cabs, jeeps, and provincial buses that signify that there is a high employment rate in the country that causes traffic [5]. However, to sustain the public needs, MMDA is the government agency that provides public services to Filipino citizens through providing updated public traffic information. For past years, MMDA used Telephony lines and Television Broadcasting for traffic information dissemination, which is very costly in maintenance that made them to adopt Twitter to post Traffic updates and advisories to the public [6]. Since, this government agency uses Twitter in disseminating information through posting tweet, there is a need for a methodology on how these tweets will analyze so that citizens will have an insight [7] in decision making to avoid specific time of traffic in metro manila. From this condition, the researcher will adopt the use of MMDA tweets as the primary data source and apply the CRISP as the knowledge discovery standard processes that to be used in building methodology for descriptive analytics. 2 RELATED WORKS Organization will have an insight through dealing with huge amount of information by performing the four tasks of knowledge discovery processes. These are the data collection, data cleaning, data analysis, and data application [8]. There are many ways from traditional methods on how they can collect data for decision making such as surveys, observation, and interviewing but the problem from this traditional method ___________________________ Tristan Jay P. Calaguas is currently pursuing Doctorate degree program in Information Technology in AMA University, Philippines, and currently an Information Technology Faculty Member in The Philippines Womens University. E-mail: calaguas26@yahoo.com Dr. Menchita F. Dumlao is currently the Information Technology Chair Woman in Information Technology and Research Director in The Philippines Women’s University, Philippines