Masader Plus: A New Interface for Exploring +500 Arabic NLP Datasets Yousef Altaher 1 Ali Fadel 2 , * Mazen Alotaibi 3 Mazen Alyazidi 4 Mishari Al-Mutairi 5 Mutlaq Aldhbuiub 6 Abdulrahman Mosaibah 7 Abdelrahman Rezk 8 Abdulrazzaq Alhendi 9 Mazen Abo Shal 4 Emad A. Alghamdi 10 Maged S. Alshaibani 11 Jezia Zakraoui 12 Wafaa Mohammed 13 Kamel Gaanoun 14 Khalid N. Elmadani 15 Mustafa Ghaleb 16 Nouamane Tazi 17 Raed Alharbi 18 Maraim Masoud 3 Zaid Alyafeai 11,γ 1 King’s College London, United Kingdom 2 Amazon, Jordan 3 Independent Researcher 4 Independent Software Developer, Saudi Arabia 5 Independent Software Developer and Designer, Saudi Arabia 6 Independent Software Engineer, Saudi Arabia 7 University of Bahrain, Bahrain 8 IIT Madras, India, 9 Dasman Diabetes Institute, Kuwait 10 King Abdulaziz University, AILLA Lab, Saudi Arabia 11 KFUPM, Saudi Arabia 12 Independent Researcher, Qatar 13 University of Tübingen, Germany 14 INSEA, Morocco 15 University of Cape Town, South Africa 16 KFUPM, IRC-ISS, Saudi Arabia 17 Hugging Face, Inc 18 Saudi Electronic University, Saudi Arabia γ g201080740@kfupm.edu.sa Abstract Masader (Alyafeai et al., 2021) created a meta- data structure to be used for cataloguing Ara- bic NLP datasets. However, developing an easy way to explore such a catalogue is a challenging task. In order to give the opti- mal experience for users and researchers ex- ploring the catalogue, several design and user experience challenges must be resolved. Fur- thermore, user interactions with the website may provide an easy approach to improve the catalogue. In this paper, we introduce Masader Plus, a web interface for users to browse Masader. We demonstrate data ex- ploration, filtration, and a simple API that al- lows users to examine datasets from the back- end. Masader Plus can be explored using this link https://arbml.github.io/masader.A video recording explaining the interface can be found here https://www.youtube.com/ watch?v=SEtdlSeqchk. 1 Introduction Recently, much research work targeted different aspects related to the processing of Arabic and its dialects such as morphological analysis, resource building, machine translation, etc. However, ac- cording to (Guellil et al., 2021), most research con- centrated on building resources (lexicon, corpora, datasets). Arguably, the growth in NLP research also brings growth in datasets, which presents sub- stantial challenges for potential users in terms of resource retrieval, access, and re-use. However, research efforts that addressed metadata sourcing for Arabic, are available either as a review (Za- ghouani, 2017), or as a public catalogue (Alyafeai * This work is not related to Amazon et al., 2021) only. An intuitive user interface (UI) design (Bernal-Cardenas et al., 2019) capturing all required functionalities such as dynamic search and filtering, sorting functions, descriptive statis- tics, and data visualization need to be implemented to target different audience like researchers, social scientists, and regular users. The primary goal of this work is to enhance the work of (Alyafeai et al., 2021) on both contextual and visual features of datasets. Masader Plus ensures up-to-date avail- ability of dataset’s metadata. Furthermore, the interface provides researchers with a set of user controls for filtering, refining, and visualizing de- pending on metadata qualities to aid in user explo- ration of the metadata. Masader Plus is completely open source and available with GPL-3.0 license at https://github.com/arbml/masader. We sum- marize our contributions as the following: 1. API endpoints that support search, filtration, indexing, and reporting discussed in 4.1. 2. Search page with advanced filtration detailed in section 4.3 3. Metadata visualization by cluster, task, do- main etc. detailed in section 4.4 In the following section, we highlight the related works. Then, in section 3, we outline our system description, which is followed by a presentation of the system architecture and a demonstration of the system features in section 4. We, then, present the community contribution effort in section 5. Section 6 discusses the ethics and a broad impact statement. Finally, we conclude the paper with a conclusion and future work in section 7. arXiv:2208.00932v1 [cs.CL] 1 Aug 2022