Discovering company revenue relations from news: A network approach Zhongming Ma a, , Olivia R.L. Sheng b , Gautam Pant b a Computer Information Systems Department, California State Polytechnic University, Pomona, United States b Department of Operations and Information Systems, The University of Utah, United States abstract article info Article history: Received 31 December 2007 Received in revised form 6 April 2009 Accepted 8 April 2009 Available online 16 April 2009 Keywords: Web mining Revenue comparison Social network analysis Business news Intercompany network Large volumes of online business news provide an opportunity to explore various aspects of companies. A news story pertaining to a company often cites other companies. Using such company citations we construct an intercompany network, employ social network analysis techniques to identify a set of attributes from the network structure, and feed the attributes to machine learning methods to predict the company revenue relation (CRR) that is based on two companies' relative quantitative nancial data. Hence, we seek to understand the power of network structural attributes in predicting CRRs that are not described in the news or known at the time the news was published. The network attributes produce close to 80% precision, recall, and accuracy for all 87,340 company pairs in the network. This approach is scalable and can be extended to private and foreign companies for which nancial data is unavailable or hard to procure. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Business news contains rich and current information about companies. Investment and business analysts often need to spend signicant amounts of time scanning business news to compare a pair of companies (possibly competitors or partners) or to identify business relationships on the basis of revenues, sales, debts, or other nancial or operating metrics. However, the huge volume of news stories makes discovering interesting information for a large number of companies nontrivial and nonscalable. Content providers like Yahoo! Finance [35] typically organize online business news by company. A news story belonging to a company often mentions several other companies. The company and any of the mentioned companies may have a relation, such as in a partnership, which is covered by the news. Alternatively, the companies may simply cooccur in the same piece of news and have no relation at all. In this paper we identify company citations from large number of news stories, construct an intercompany network from the company citations, and examine whether such a network can be used to infer some meaningful relations. To explore the suggested methodology, we experiment with a company revenue relation (CRR) between two companies. For a directed company pair (i.e., source to target), their CRR is positive if the target company's revenue measure is not lower than the source's and negative otherwise. Therefore, CRR is a binary value simply indicating which company in the pair is more powerful in terms of their revenues. Because revenue-based comparisons of companies are common to investment and business analysis, we choose to study this paired revenue-based measurement of CRR as an example of business relationships to test our methodology. Using news we build the intercompany network in which each node is a company and a link between two companies indicates that a news story pertaining to one company cites/mentions the other. The intercompany network is viewed as a social network [33,28] whose structure can be quantied through graph-theoretic attributes. We employ and extend a set of graph-based measurements from social network analysis (SNA) literature, report their distributions, and measure how well CRR between two companies can be predicted by those graph-based measurements. Our approach is based on prior ndings about graph-based attributes. Literature in different domains (e.g., sociology and computer science) nds that graph-based attributes reect certain properties of nodes in the network. For example, outdegree is a simple measure of centrality [33] and indegree represents a prestige [33] or authority measure [20]. Hence an intuition is that when company A is mentioned many times in news stories pertaining to other companies, A is likely to be powerful (e.g., high revenue). Even though we expect a lot of noise in the company citations due to meaningless cooccurrence, we hope that by deriving data from large number of news stories over a certain time and for thousands of companies, the effect of noise may be diminished. So the novelty of this research is in the use of structural attributes of networks derived from seemingly irrelevant data (company citations) to discover knowledge (i.e., CRR) given the fact Decision Support Systems 47 (2009) 408414 Corresponding author. E-mail addresses: zma@csupomona.edu (Z. Ma), olivia.sheng@business.utah.edu (O.R.L. Sheng), gautam.pant@business.utah.edu (G. Pant). 0167-9236/$ see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2009.04.007 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss