A proposed model of knowledge representation and the coding of knowledge embedded in texts of Web published scientific articles Carlos Henrique Marcondes a1 , Marília Alvarenga Rocha Mendonça a , Luciana Reis Malheiros b a Department of Information Science b Department of Physiology and Pharmacology Federal Fluminense University, R. Lara Vilela, 126, 24210-590, Niterói, RJ, Brazil This article reports results of a research project with the aim of investigating the possibilities of electronic publishing journal articles both as text for human reading and in machine readable format recording the new knowledge contained in the article. This knowledge is identified with the scientific methodology elements such as problem, methodology, hypothesis, results, and conclusions. A model integrating all those elements is proposed which makes explicit and records in XML the article contribution, new knowledge and scientific novelty. The use of XML language to represent this knowledge enables its processing by intelligent software agents Despite the fact that electronic publishing is a common activity to scholars electronic journals are still based in the print model and do not take full advantage of the facilities offered by the Web environment. The proposed model aims to take advantage of these facilities enabling semantic retrieval and validation of the knowledge contained in articles. To validate and enhance the model a set of electronic journal articles were analyzed. Keywords: electronic publishing, scientific methodology, scientific communication, knowledge representation, ontologies. 1 INTRODUCTION Nowadays, electronic Web publishing is a common activity to scholars and researchers. Despite this fact, electronic journals are still based in the print model and do not take full advantage of the facilities offered by the Web environment. Since the Philosophical Transactions of Royal Society in the 17 th century the scientific article is the container of new scientific knowledge. Before the raise of the Web, paper journals collections in libraries constituted the humanity scientific knowledge bases. Today there are two main barriers to a large scale use of this knowledge: the amount of information available throughout the Web and the fact that knowledge is embedded in the text of scientific articles in an unstructured way, not adequate for program processing. Scientific communication is a slow social process that largely depends on discourse, text producing and reading/interpreting/inquiring these texts by scholars until new knowledge is incorporated to the corpus of Science. The potential of new information technology has been applied to modern bibliographic information systems to improve scientific communication, providing fast notification and immediate access to full-text scientific documents. But IT is not yet used to directly process the knowledge embedded in the text of scientific articles. In the Semantic Web context [1], electronic publishing could be a cognitive tool which its potential is far from being explored [2]. A related project which points toward the same objective of enlarging this potential is W3C Scientific Publishing Task Force Ontology for Experiment Self-Publishing 1 . The objective of this research is to propose a Web-publishing model which enables the electronic publishing of scientific articles not only as texts for human reading, but also as a knowledge base in machine-readable format in XML 2 . As Scientific Methodologies handbooks emphasize [3], [4], [5], [6], scientific knowledge has the form of relations between phenomena. In special, the hypothesis is the element which contains a relation. We envisage an authoring/self-publishing environment in which knowledge in the text of articles – the elements of scientific reasoning - are identified and recorded in machine readable format. A framework to analyze text is proposed by Kintsh and Van Dijk [7]. Gardin [8] proposes that scientific articles have embedded in their texts the scientist reasoning. In Brazilian Information Science literature, Smit [9] and Kobashi [10], applied both the proposals of Kintsh/Van Dijk and Gardin to the analysis of document for indexing and abstraction. This proposal intends to go further than indexing for providing access, it intends to enable the processing of the knowledge embedded in articles texts. New discoveries in Science are validated comparing them with the assented knowledge in a specific domain. Before the raise of the Web, what constitutes this assented knowledge was fuzzy, lacks formalization, was scattered across journals collections in libraries. The main mechanism of Science validation was and still is reading, interpreting, inquiring, criticizing and, in brief, citing journal articles by scholars, until new knowledge was finally incorporated in the fuzzy corpus of Science. This knowledge, through to the scientific communication process, is turned into what Ziman [11] calls the public knowledge. 1 http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce 2 XML- Extensible Markup Language, a standard from W3C, http://www.w3c.org/xml