Knowledge Base Smarter Articulations for the Open
Directory Project in a Sustainable Digital Ecosystem
Shastri L Nimmagadda
School of Information Systems
Curtin University, Perth, Australia
+61 8 9266 9780
shastri.nimmagadda@curtin.edu.au
Dengya Zhu
School of Information Systems
Curtin University, Perth, Australia
+61 8 9266 7056
D.Zhu@curtin.edu.au
Amit Rudra
School of Information Systems
Curtin University, Perth, Australia
+61 8 9266 7055
A.Rudra@curtin.edu.au
ABSTRACT
We examine the volumes and varieties of data sources of the Open
Directory Project (ODP), which can endure, regenerate and
flourish with new knowledge. The ODP motivates us in building a
knowledge base smarter multidimensional data constructs and
models. We articulate the models with new artefacts, addressing
the heterogeneity and multidimensionality of the data. The
conceptualization and contextualization of various entities and
dimensions have emerged with innovation that led us to develop a
digital ecosystem-based inventory. The ODP based domain
ontologies support the warehouse repository, which
accommodates multidimensional data relationships. The concept
of a digital ecosystem in the ODP context is to bring the
dimensions together and unite with multidimensional schemas.
We explore the Big Data, incorporating their characteristics in the
ODP constructs and models. The volumes and varieties of the
ODP data are logically organized and integrated in the warehouse
repositories. The multidimensional data modelling makes the
ODP more smart and flexible in an environment, where varieties
of business rules and constraints change rapidly. The visualization
and interpretation are the other artefacts of the Big Data
facilitating us use, reuse, test the interoperability and effectiveness
of the data models for sustainable ODP digital ecosystem. We
compute the polynomial regressions, based on the data
fluctuations of the ODP as observed in the scatter plots, providing
new data mining models for knowledge interpretation.
Categories and Subject Descriptors
E.1 [Data]: Data Structures; E.5 [Data]: Organization/Structure;
H.2.2 [Database Management]: Physical Design; H.3.2
[Information Storage and Retrieval]: Information Storage, File
Organization.
General Terms
Data Structures; Documentation; Design; Management;
Performance and Standardization.
Keywords
The ODP; Digital Ecosystem; Multidimensional Data Modelling;
Domain Ontologies.
© 2017 International World Wide Web Conference Committee (Iw3c2),
Published Under Creative Commons Cc By 4.0 License.
WWW 2017 Companion, April 3-7, 2017, Perth, Australia.
ACM 978-1-4503-4914-7/17/04.
http://dx.doi.org/10.1145/3041021.3054769
1. INTRODUCTION
We develop the concept of a digital ecosystem, simulating the
ODP framework. An ontology-based data warehousing and
mining motivate us a mechanism for bringing a comprehensive,
consistent, flexible and smart metadata together all in a single
repository, encapsulated in a digital ecosystem. Managing the
advancement of the human edited ODP [19, 29], with 91868
number of editors, 1, 031, 852 categories, 3, 871, 704 websites
and 90 languages and the continuing effort of the web-based
directory is a huge task. We need a more holistic and smart
integrated framework with new data modelling artefacts. The
generalization and specialization hierarchies [8, 24, 25] play roles
on data relationships through various ontology descriptions. The
domain ontologies further enable us the data integration process,
formulating the integrated framework, in particular, the
knowledge-based conceptualization and contextualization
attributes as interpreted in various digital ecosystems.
Additionally, keeping in view the current volumes and varieties of
the ODP data, we exploit the use of Big Data concepts [1] in
building knowledge-based constructs and models. The models are
likely to deliver an efficient data mining and interpretation that
can explore the connectivity in between the categories, sub-
categories and their levels. We apply the statistical polynomial
regression for establishing the models of data relationships
between the categories, sub-categories and levels (web layers).
2. PROBLEM STATEMENT
In spite of major breakthroughs and advances in the internet
technologies, identification and precise description of systems and
their connectivity remain unresolved. This is partly due to poorly
integrated multiple data sources and domains, in which the
phenomenon of an ecosystem has not been readily descriptive.
Heterogeneity and multidimensionality of data sources are the
other major issues. The unstructured data complicate the concept
identification, data integration and interpretation in different
knowledge domains. Highly specialized data semantics [12, 14]
make it infeasible to incorporate ideas within a consistent
repository. The meaning of data is usually hard to define precisely
[14, 16] because they are neither explicitly stated nor implicitly
included in the database designs. An ontology description of an
entity or a dimension is not a single, consistent scientific domain;
it is composed of several dozens of smaller, focused research
communities. It would not be a significant issue if researchers
were able to access data from a single domain, but that is not
usually the case. Typically, the researchers require data access
from an integrated metadata of the ODP [6, 17], after resolving
the terms that have different meanings and vocabularies across
diverse communities or domains. The observations further
1537