Publishing XBRL as Linked Open Data Roberto García Universitat de Lleida Jaume II, 69 25001 Lleida, Spain +34 973 702 742 rgarcia@diei.udl.cat Rosa Gil Universitat de Lleida Jaume II, 69 25001 Lleida, Spain +34 973 702 742 rgil@diei.udl.cat ABSTRACT The XML Business Reporting Language (XBRL) is a standard for business and financial information reporting. It is based on XML so instance documents based on XBRL, e.g. a quarterly report, are highly constrained by the XML document-oriented nature. This makes more difficult to perform queries that mix information from filings from different dates, companies, or accounting principles than with a formalism based on a graph model instead of a tree model. Semantic Web technologies provide a graph model that facilitates mashing-up different XBRL sources. We have put into practice this approach mapping the XBRL filings available from the SEC’s EDGAR program to Resource Description Framework (RDF) and the XML Schema taxonomies these filings are based on to Web Ontology Language (OWL). The resulting semantic metadata, though highly tied to the XML structure it is mapped from, benefits from Semantic Web technologies and tools in order to facilitate integration and cross- querying, even together with other parts of the Web of Linked Data. Keywords Business, reporting, Semantic Web, Linked Data, Web 3.0, accounting, finance, interoperability. 1. INTRODUCTION XBRL (eXtensible Business Reporting Language) is an XML language intended for modeling, exchanging and automatically processing business and financial information. XBRL is starting to be deployed in many different scenarios. For instance, there is the EDGAR [1] program promoted by the U.S Securities and Exchange Commission (SEC). It performs automated collection, validation, indexing, acceptance and forwarding of submissions by companies and others who are required by law to file forms with the SEC. Filers may choose to voluntarily submit documents in XBRL format to accompany certain official filings. Three dozen companies, representing more than $1 trillion of market value, have joined the SEC's XBRL test group. However, we have observed limited support for cross analysis of financial information in XBRL tools and applications, as it is detailed in the Related Work Section. This is not just among data based on different accounting principles, which are represented in XBRL using taxonomies. It even happens when comparing filings for different companies based on the same taxonomies or filings for the same company based on different versions of the taxonomies. We argue that this limitation is inherited from the technologies underlying XBRL, especially XML. XML takes a document- oriented approach, where each document presents a tree structure. This makes it difficult for XML-based tools to provide functionalities that blur this separation into documents and that overcome the limitations of a tree structure when mashing-up data from different sources. Moreover, XBRL does not provide formal semantics that might help to integrate different taxonomies by using logic reasoners. In any case, the integration of data contained in XBRL into comparable information is a strong requirement for the analysis of business and financial information at the global level. This might increase the efficiency and effectiveness of the decision making processes relying on this kind of information. For instance, bankruptcy prediction and other tasks related to the assessment of the solvency of a firm, a business sector or set of interrelated companies. Many have already pointed to this issue and proposed Semantic Web technologies as a natural choice for XBRL data integration, cf. the Related Work Section. However, we think that this is not enough. Semantic Web provides the technologies for data integration but some principles are required that facilitate Web- wide deployment of highly interlinked XBRL data. Linked Data [2] provides these principles to publish data in the World Wide Web in a way that helps making it easily discoverable through the links that connect it to other pieces of data. Despite these benefits, currently, financial and business data is being produced using XBRL and it seems that more and more XBRL data is going to be available in the future. It is been promoted by regulators and government agencies like the SEC and other entities like the European Union or the Spanish securities commission [3]. Consequently, we think that the best approach in order to get financial and business data to the Semantic Web is not to propose an alternative language based on Semantic Web technologies, but to apply methods to map existing XBRL to semantic metadata. This approach, its results and its validations are presented in the following sections, after XBRL is introduced. 2. XBRL XBRL is based on two kinds of documents, instance documents and taxonomies. Instance documents report business facts and point to a set of taxonomies, which define the meaning of these Copyright is held by the author/owner(s). LDOW2009, April 20, 2009, Madrid, Spain.