International Journal of Computer Applications (0975 – 8887) Volume 51– No.1, August 2012 36 Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Technology, Solapur, India Rajesh V. Argiddi Assistant Prof. Computer Science Department, Walchand Institute of Technology, Solapur, India S. S. Apte Professor and Head of Department Of Computer Science and Engineering, Walchand Institute of Technology, Solapur, India ABSTRACT There is large amount of financial data that are generated and evaluated at a high speed. These financial data is coming continuously, changing with time and may be unpredictable. Therefore there is a critical need for automated approaches to effective and efficient utilization of large amount of data to support companies and individuals for decision-making. Data mining techniques can be used to uncover hidden patterns, to discover the behavior of the stock market, to find out the trends in financial markets and so on. For predicting stock trends and making financial trading decisions, a new model is presented. It is based on combination of data and text mining techniques which takes the textual contents of time-stamped web documents along with numerical time series data and performs the future prediction. By using this model, we will show that the accuracy of result will be improved. Keywords Data mining, pre-processing, feature extraction. 1. INTRODUCTION Over the past decades, many attempts have been made at understanding and predicting the future using data mining methods. Among them, to forecast the price movements in stock market and making the decision is considered as major challenge. However, most methods suffer from serious drawback and therefore results are hard to understand and producing inaccurate predictions. Therefore, predicting stock price movements is difficult. Data mining techniques are able to detect future trends and behaviors in financial markets. The Efficient Market Hypothesis (EMH), as stated by Fama ([2], [3], [4]), assumes that stock prices fully reflect all their relevant information at any given point in time and that everyone has some degree of access to the information. In ‘Random Walks ([4])’ theory stock prediction is considered to be impossible, where stock prices are changes randomly. Most financial specialists try to use the time gap of the markets adjustments to new information for making their own predictions. They do this by combining both technical and fundamental analysis strategies. In technical analysis, it performs the prediction based on past price, while in fundamental analysis; it is based on real economy factors, such as trading volume, organizational changes in the company, etc [1]. Therefore stock market data or financial news articles can be used to get data required by these two strategies. The conventional approach to modeling stock market returns is to model the univariate time-series with autoregressive (AR) and moving average (MA) models. The autoregressive conditional heteroskedasticity (ARCH) class of models was originally introduced by (Engle, 1982) and has become a core part of empirical finance. Recently, Engle [9] and Bollereslev [10] provided a new very powerful tool for the modeling of financial data in general and stock market returns in particular. The new process suggested by Engle and Bollereslev [11] is different from earlier conventional time series models in that, instead of making the assumption that the variances are constant they allow the conditional variances to change over time as functions of past errors. These models are deterministic in the sense that they attempt to use mathematical equations to describe the process that generates the time-series. A disadvantage of these models lays in the assumption that trader or financial analyst needs to determine the appropriate number of lags and sometimes the successful analysis is based on the experience of analyzing the enormous variety of time series econometrical models. The advantage of these models lays in their ultimate interpretability. Different similarity queries on time-series have been introduces ([12], [13]). Mining different queries from huge time-series data is one of the important issues for researchers. In useful data mining techniques like classification and clustering, to handle time-series data is one of the stimulating research issues. Given a set of cases with class labels as training set, classification is to build a model (classifier) to predict future data objects for which the class label is unknown. Classification is one kind of data mining technique to identify essential features of different classes based on a set of training data and then classify unseen instances into the appropriate classes. Decision trees [5] have been found very effective for classification of huge and frequently modifiable databases e.g., Stock Market, Shopping Mall etc. Decision trees are analytical tools used to discover rules and relationships by systematically breaking down and subdividing the information contained in data set. Now a day’s more and more important and commercially valuable information becomes available on the World Wide Web. Also financial services companies are making their products increasingly available on the Web. There are various types of financial information sources on the Web. Internet provides almost all possible information on the stock market worldwide through various useful websites as Google finance, Financial Times, Yahoo finance and many more. The reliability of the information depends on the reputation and the quality of the source sites. The news in the web pages is also related with the time in the publisher country. All these source of information contain global and regional political and economic news, as well as recommendations from financial analysts. This is the kind of information that moves bond, stock and currency markets in Asia. This rich variety of information and news make it an attractive resource from which to mine knowledge. Techniques are presented enabling to predict the movements of major stock market indices from up-to date textual financial analysis and research