Discovering company revenue relations from news: A network approach
Zhongming Ma
a,
⁎, Olivia R.L. Sheng
b
, Gautam Pant
b
a
Computer Information Systems Department, California State Polytechnic University, Pomona, United States
b
Department of Operations and Information Systems, The University of Utah, United States
abstract article info
Article history:
Received 31 December 2007
Received in revised form 6 April 2009
Accepted 8 April 2009
Available online 16 April 2009
Keywords:
Web mining
Revenue comparison
Social network analysis
Business news
Intercompany network
Large volumes of online business news provide an opportunity to explore various aspects of companies. A
news story pertaining to a company often cites other companies. Using such company citations we construct
an intercompany network, employ social network analysis techniques to identify a set of attributes from the
network structure, and feed the attributes to machine learning methods to predict the company revenue
relation (CRR) that is based on two companies' relative quantitative financial data. Hence, we seek to
understand the power of network structural attributes in predicting CRRs that are not described in the news
or known at the time the news was published. The network attributes produce close to 80% precision, recall,
and accuracy for all 87,340 company pairs in the network. This approach is scalable and can be extended to
private and foreign companies for which financial data is unavailable or hard to procure.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction
Business news contains rich and current information about
companies. Investment and business analysts often need to spend
significant amounts of time scanning business news to compare a pair
of companies (possibly competitors or partners) or to identify
business relationships on the basis of revenues, sales, debts, or other
financial or operating metrics. However, the huge volume of news
stories makes discovering interesting information for a large number
of companies nontrivial and nonscalable. Content providers like
Yahoo! Finance [35] typically organize online business news by
company. A news story belonging to a company often mentions
several other companies. The company and any of the mentioned
companies may have a relation, such as in a partnership, which is
covered by the news. Alternatively, the companies may simply cooccur
in the same piece of news and have no relation at all. In this paper we
identify company citations from large number of news stories,
construct an intercompany network from the company citations,
and examine whether such a network can be used to infer some
meaningful relations. To explore the suggested methodology, we
experiment with a company revenue relation (CRR) between two
companies. For a directed company pair (i.e., source to target), their
CRR is positive if the target company's revenue measure is not lower
than the source's and negative otherwise. Therefore, CRR is a binary
value simply indicating which company in the pair is more “powerful”
in terms of their revenues. Because revenue-based comparisons of
companies are common to investment and business analysis, we
choose to study this paired revenue-based measurement of CRR as an
example of business relationships to test our methodology.
Using news we build the intercompany network in which each
node is a company and a link between two companies indicates that a
news story pertaining to one company cites/mentions the other. The
intercompany network is viewed as a social network [33,28] whose
structure can be quantified through graph-theoretic attributes. We
employ and extend a set of graph-based measurements from social
network analysis (SNA) literature, report their distributions, and
measure how well CRR between two companies can be predicted by
those graph-based measurements.
Our approach is based on prior findings about graph-based
attributes. Literature in different domains (e.g., sociology and
computer science) finds that graph-based attributes reflect certain
properties of nodes in the network. For example, outdegree is a simple
measure of centrality [33] and indegree represents a prestige [33] or
authority measure [20]. Hence an intuition is that when company A is
mentioned many times in news stories pertaining to other companies,
A is likely to be powerful (e.g., high revenue). Even though we expect a
lot of noise in the company citations due to meaningless cooccurrence,
we hope that by deriving data from large number of news stories over
a certain time and for thousands of companies, the effect of noise may
be diminished. So the novelty of this research is in the use of structural
attributes of networks derived from seemingly irrelevant data
(company citations) to discover knowledge (i.e., CRR) given the fact
Decision Support Systems 47 (2009) 408–414
⁎ Corresponding author.
E-mail addresses: zma@csupomona.edu (Z. Ma), olivia.sheng@business.utah.edu
(O.R.L. Sheng), gautam.pant@business.utah.edu (G. Pant).
0167-9236/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.dss.2009.04.007
Contents lists available at ScienceDirect
Decision Support Systems
journal homepage: www.elsevier.com/locate/dss