Improving Traffic Prediction with Tweet Semantics Jingrui He*, Wei Shen , Phani Divakaruni , Laura Wynter , Rick Lawrence * Computer Science Department, Stevens Institute of Technology, jingrui.he@gmail.com @Walmartlabs, Walmart eCommerce, wshen@walmartlabs.com BAMS, IBM Research, {ricklawr, phanid, lwynter}@us.ibm.com Abstract Road traffic prediction is a critical component in modern smart transportation systems. It provides the basis for traffic management agencies to gener- ate proactive traffic operation strategies for allevi- ating congestion. Existing work on near-term traf- fic prediction (forecasting horizons in the range of 5 minutes to 1 hour) relies on the past and current traffic conditions. However, once the forecasting horizon is beyond 1 hour, i.e., in longer-term traffic prediction, these techniques do not work well since additional factors other than the past and current traffic conditions start to play important roles. To address this problem, in this paper, for the first time, we examine whether it is possible to use the rich information in online social media to im- prove longer-term traffic prediction. To this end, we first analyze the correlation between traffic vol- ume and tweet counts with various granularities. Then we propose an optimization framework to ex- tract traffic indicators based on tweet semantics us- ing a transformation matrix, and incorporate them into traffic prediction via linear regression. Exper- imental results using traffic and Twitter data origi- nated from the San Francisco Bay area of Califor- nia demonstrate the effectiveness of our proposed framework. 1 Introduction With the steadily increasing number of motor vehicles in the United States, road traffic prediction becomes a critical component in modern smart transportation systems. Accu- rate prediction of both near-term and longer-term traffic con- ditions can greatly help traffic management agencies gener- ate proactive strategies to alleviate congestion. It can also help road users better plan their trips by avoiding road seg- ments expected to be congested soon. Existing work on road traffic prediction largely focuses on forecasting horizons in the range of 5 minutes to 1 hour by using past and current traffic conditions [Al-Deek et al., 2001; Smith et al., 2002; Kamarianakis and Prastacos, 2003; Min and Wynter, 2011]. The proposed techniques do not generalize well to fore- casting horizons beyond 1 hour due to the impact of addi- tional factors, such as scheduled events [Maze et al., 2006; Mahmassani et al., 2009]. With the rapid growth of online social media, more and more people are using Twitter, Facebook, etc to communicate their mood, activities, plans, as well as to exchange news and ideas, which creates a huge repository containing information not accessible from conventional media. In particular, a lot of people are using their mobile devices to access the social me- dia web sites via web applications, hence generating a large number of messages on the go. Many of the messages are re- lated to the current traffic conditions, such as ’Traffic jam on new preedy street, near Parking Plaza Saddar, cars unmoved for last 20 mins’, ’Big road block intersection of Rondebult and Commissioner street Boksburg’, etc. It is also common for people to announce their travel plans in the near future, such as ’This SUNDAY !!!! We will be playing at Di Piazzas in Long Beach’, ’good night! getting up early tomorrow to pack and then off to the airport for our flight @ 5PM’, etc. Motivated by the uniqueness of the information contained in online social media, and the close relationship between traffic and tweets, In this paper, we answer the following question: can we extract tweet-based semantics to help im- prove longer-term traffic prediction? To answer this ques- tion, we first establish the correlation between traffic mea- surements and tweet counts at various granularity. Then we directly extract semantics from tweets via a sparse matrix, and incorporate the semantics into the auto-regression model used in traffic prediction. Finally, the sparse matrix is ob- tained by solving an optimization framework, whose goal is to minimize the prediction error in the traffic measurements. The rest of the paper is organized as follows. In Section 2, we briefly review existing work on traffic prediction and social media aided analysis. Then we study the correlation between traffic measurements and tweet counts in Section 3. It leads to the optimization framework for systematically in- corporating tweet semantics in traffic prediction and an iter- ative algorithm for solving it in Section 4. The experimental results are presented in Section 5. Finally, we conclude the paper in Section 6. 2 Related Work In this section, we review the existing work from two per- spectives, namely traffic prediction and social media aided analysis. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence 1387