Kousha 1 H. Kretschmer & F. Havemann (Eds.): Proceedings of WIS 2008, Berlin Fourth International Conference on Webometrics, Informetrics and Scientometrics & Ninth COLLNET Meeting Humboldt-Universität zu Berlin, Institute for Library and Information Science (IBI) This is an Open Access document licensed under the Creative Commons License BY http://creativecommons.org/licenses/by/2.0/ Characteristics of Open Access Web Citation Network: A Multidisciplinary Study Kayvan Kousha 1 27 May 2008 1 Department of Library and Information Science, University of Tehran, Iran, Email: kkoosha@ut.ac.ir Abstract More knowledge about Open Access (OA) scholarly publishing on the web would be help- ful for citation data mining and the development of Web-based citation indexes. In the current study, five characteristics of 545 OA citing sources targeting OA research articles in four science and four social science disciplines were manually identified, including file format, hy- perlinking, Internet domain, language, and pub- lication year. About 60% of the OA citing sources targeting research papers were in PDF format, 30% were from academic domains end- ing in edu and ac and 70% of the citations were not hyperlinked. Moreover, 16% of the OA cit- ing sources targeting studied papers in the eight selected disciplines were in non-English lan- guages. Additional analyses revealed significant disciplinary differences across science and the social sciences. Overall, the OA Web citation network was dominated by PDF format files and non-hyperlinked citations. This knowledge of some characteristics shaping the OA citation network gives a better understanding about their potential uses. 1 Introduction The Web is an important source for Open Ac- cess (OA) publishing and dissemination of re- search results. Many have discussed the poten- tial of OA publishing in the scholarly communi- cation cycle (e.g., Harnad 1990; Harnad 1991; Harnad 1999). Others have investigated the cita- tion impact of OA publications in different sub- ject areas (e.g., Antelman 2004; Harnad & Brody 2004; Lawrence, 2001, Kurtz, 2004, Shin, 2003). Since over 90% of journals “have given their official green light to author self- archiving” (Harnad et al. 2004), and an increas- ing number of authors, journals and institutions are willing to publish their research results online (Swan & Brown 2004; Swan & Brown 2005), a huge amount of citation information, especially from open access Web documents, has become available on the Web. Some re- searchers have proposed and tested mechanisms for extracting citation information from Web and classified Web citations to validate the web environment as an important source for Scien- tometrics analysis (e.g., Vaughan and Shaw 2003; Vaughan & Shaw 2005). However, it is not well understood which characteristics have influenced the web citation network and whether disciplinary differences (in science and social sciences) are an important factor in types of characteristics (see below). The current study focuses on five character- istics of OA scholarly publication (i.e., journal and conference papers, research reports, disser- tations) targeting scientific articles in four sci- ence and four social science disciplines. In par- ticular, the current study is intended to identify characteristics of OA citing Web documents including file format, hyperlinking, Internet domain, language and publication year and to examine what these characteristics imply for Web citation extraction methods. The results may shed light on the design and development of scientific web mining tools (e.g., Web-based