Knowledge Base Smarter Articulations for the Open Directory Project in a Sustainable Digital Ecosystem Shastri L Nimmagadda School of Information Systems Curtin University, Perth, Australia +61 8 9266 9780 shastri.nimmagadda@curtin.edu.au Dengya Zhu School of Information Systems Curtin University, Perth, Australia +61 8 9266 7056 D.Zhu@curtin.edu.au Amit Rudra School of Information Systems Curtin University, Perth, Australia +61 8 9266 7055 A.Rudra@curtin.edu.au ABSTRACT We examine the volumes and varieties of data sources of the Open Directory Project (ODP), which can endure, regenerate and flourish with new knowledge. The ODP motivates us in building a knowledge base smarter multidimensional data constructs and models. We articulate the models with new artefacts, addressing the heterogeneity and multidimensionality of the data. The conceptualization and contextualization of various entities and dimensions have emerged with innovation that led us to develop a digital ecosystem-based inventory. The ODP based domain ontologies support the warehouse repository, which accommodates multidimensional data relationships. The concept of a digital ecosystem in the ODP context is to bring the dimensions together and unite with multidimensional schemas. We explore the Big Data, incorporating their characteristics in the ODP constructs and models. The volumes and varieties of the ODP data are logically organized and integrated in the warehouse repositories. The multidimensional data modelling makes the ODP more smart and flexible in an environment, where varieties of business rules and constraints change rapidly. The visualization and interpretation are the other artefacts of the Big Data facilitating us use, reuse, test the interoperability and effectiveness of the data models for sustainable ODP digital ecosystem. We compute the polynomial regressions, based on the data fluctuations of the ODP as observed in the scatter plots, providing new data mining models for knowledge interpretation. Categories and Subject Descriptors E.1 [Data]: Data Structures; E.5 [Data]: Organization/Structure; H.2.2 [Database Management]: Physical Design; H.3.2 [Information Storage and Retrieval]: Information Storage, File Organization. General Terms Data Structures; Documentation; Design; Management; Performance and Standardization. Keywords The ODP; Digital Ecosystem; Multidimensional Data Modelling; Domain Ontologies. © 2017 International World Wide Web Conference Committee (Iw3c2), Published Under Creative Commons Cc By 4.0 License. WWW 2017 Companion, April 3-7, 2017, Perth, Australia. ACM 978-1-4503-4914-7/17/04. http://dx.doi.org/10.1145/3041021.3054769 1. INTRODUCTION We develop the concept of a digital ecosystem, simulating the ODP framework. An ontology-based data warehousing and mining motivate us a mechanism for bringing a comprehensive, consistent, flexible and smart metadata together all in a single repository, encapsulated in a digital ecosystem. Managing the advancement of the human edited ODP [19, 29], with 91868 number of editors, 1, 031, 852 categories, 3, 871, 704 websites and 90 languages and the continuing effort of the web-based directory is a huge task. We need a more holistic and smart integrated framework with new data modelling artefacts. The generalization and specialization hierarchies [8, 24, 25] play roles on data relationships through various ontology descriptions. The domain ontologies further enable us the data integration process, formulating the integrated framework, in particular, the knowledge-based conceptualization and contextualization attributes as interpreted in various digital ecosystems. Additionally, keeping in view the current volumes and varieties of the ODP data, we exploit the use of Big Data concepts [1] in building knowledge-based constructs and models. The models are likely to deliver an efficient data mining and interpretation that can explore the connectivity in between the categories, sub- categories and their levels. We apply the statistical polynomial regression for establishing the models of data relationships between the categories, sub-categories and levels (web layers). 2. PROBLEM STATEMENT In spite of major breakthroughs and advances in the internet technologies, identification and precise description of systems and their connectivity remain unresolved. This is partly due to poorly integrated multiple data sources and domains, in which the phenomenon of an ecosystem has not been readily descriptive. Heterogeneity and multidimensionality of data sources are the other major issues. The unstructured data complicate the concept identification, data integration and interpretation in different knowledge domains. Highly specialized data semantics [12, 14] make it infeasible to incorporate ideas within a consistent repository. The meaning of data is usually hard to define precisely [14, 16] because they are neither explicitly stated nor implicitly included in the database designs. An ontology description of an entity or a dimension is not a single, consistent scientific domain; it is composed of several dozens of smaller, focused research communities. It would not be a significant issue if researchers were able to access data from a single domain, but that is not usually the case. Typically, the researchers require data access from an integrated metadata of the ODP [6, 17], after resolving the terms that have different meanings and vocabularies across diverse communities or domains. The observations further 1537