2014 PHILIPPINE COUNTRY REPORT Rachel Edita Roxas 1 , Nathaniel Oco 1 , Leif Romeritch Syliongka 2 National University 1 De La Salle University 2 1. COUNTRY REPORT The Philippines is a linguistically-rich country that has 7,107 islands and with 185 listed individual languages across the archipelago. Among these, 41 are listed as institutional, 71 are developing, 46 are vigorous, 13 are in trouble, 10 are dying, and 4 are already extinct 1 . Research efforts in these languages were carried by different organizations and institutions. Tools are being developed towards the collection and documentation of various forms of these languages. For text, Linguist’s Assistant [1], [2] was used. The Philippine component of the Network-based ASEAN Languages Translation Public Service project has produced an English- Tagalog parallel corpus with 6.2 million words [3]. It also produced a collection of 13 million tweets, 842,000 lines of game chat, and 1,790 news articles. For audio, work has been done on Batangan Tagalog intonation [4]; Kapampangan sociophonetics [5]; Ilokano vowel space, rhotics, and geminates [6]; Hiligaynon and Kana prosody [7]; and Conyo vowel features [8] through collected speech samples from different informants. “Inter-disciplinary Signal Processing for Pinoys (ISIP): ICT for Education” 2 has produced a total of 1,328 speech recordings in seven different Philippine languages – Bikol, Kapampangan, Hiligaynon, Waray, Tausug, Pangasinense, and Ilokano. In addition, the Summer Institute of Linguistics Philippines 3 has contributed to language research in 93 Philippines languages through community immersion. Their archives include more than 3,500 published material, 90,905 photos, 1,035 audio recordings, and 197 video recordings. Finally, these resources are being used for various language identification tools – both in text [3], [9] and audio [10]. 1 Ethnologue language status profile for the Philippines: http://www.ethnologue.com/country/PH 2 R. Cajote, “Inter-disciplinary Signal Processing for Pinoys (ISIP): ICT for Education”, Annual Technical Report, project funded by the Department of Science and Technology. 3 SIL Philippines: http://www- 01.sil.org/asia/philippines/index.html 2. REFERENCES [1] T. Allman, S. Beale, and R. Denton, “Toward an Optimal Multilingual Natural Language Generator: Deep Source Analysis and Shallow Target Analysis”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 04-10, 2014. [2] M.A. Castilo, M.P. Go, A.J. Lam, O.B. Syson, P. Xu, E. Ong, and S. Beale, “Building a Simple Linguist’s Assistant for Tagalog”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 25-31, 2014. [3] N. Oco, R. Sison-Buban, L.R. Syliongka, R.E. Roxas, and J. Ilao, “Ang Paggamit ng Trigram Ranking Bilang Panukat sa Pagkakahalintulad at Pagkakapangkat ng mga Wika [Trigram Ranking: Metric for Language Similarity and Clustering]”, Malay, 26(2), pp. 53-68, 2014. [4] L.C. Katigbak, “"Ala eh!" A Preliminary Analysis on Batangan Tagalog Intonation”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 11-18, 2014. [5] A.G.H. Peralta and L.J.M. Tee, “Some Observations on the Sociophonetics of Kapampangan”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 100-107, 2014. [6] A.M.D. Corbillon and K.B.E. Saure, “Some Remarks on Ilokano Vowel Space, Rhotics, and Geminates”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 108-114, 2014. [7] C.A. Chu-Santos and G.M. Salomon, “Vowel Space and Prosody in Hiligaynon and Kana”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 120-125, 2014. [8] C.A.C. Gaspi and M.C.N. Jambora, “What is Conyo Ba? A look on Vowel Features of Conyo”, Proceedings of the 10 th National Natural Language Processing Research Symposium, pp. 126-132, 2014. [9] N. Oco, J. Ilao, R.E. Roxas, and L.R. Syliongka, “Measuring Language Similarity using Trigrams: Limitations of Language Identification”, Proceedings of the 3 rd International Conference on Recent Trends in Information Technology, 2013. [10] A.F.B. Laguna and R.C.L. Guevara, “Experiments on Automatic Language Identification for Philippine Languages using Acoustic Gaussian Mixture Models”, IEEE Proceeding TENSYMP, 2014.