Complexity and entropy density analysis of the Korean stock market J. B. Park * , J. W. Lee * , H.-H. Jo, J.-S. Yang, and H.-T. Moon Department of Physics, Korea Advanced Institute of Science and Technology Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea Abstract In this paper, we studied complexity and entropy density of stock market by modeling ε -machine of Korean Composition Stock Price Index (KOSPI) from year 1992 to 2003 using causal-state splitting reconstruction (CSSR) algorithm. Keywords: Econophysics, computational mechanics, ε -machine, statistical complexity, entropy density. 1. Introduction Computational mechanics (CM) has been studied in various fields of science. It has been applied to analyze abstract models such as cellular automata [1,2] and Ising spin system [3], as well as natural data in geomagnetism [4]. In this paper, we analyzed financial time series using CM to find the statistical complexity and the entropy density of Korean stock market. Empirical time series in financial market have been investigated by using various methods such as rescaled range (R/S) analysis to test presence of correlations [5] and detrended fluctuation analysis (DFA) to detect long-range correlations embedded in seemingly non- stationary time series [6]. We believe that CM enables the complexities and structures of different sets of data to be quantifiably compared and that it directly discovers intrinsic causal structure within the data [4]. In order to study the statistical complexity and the entropy density in CM, we used causal-state splitting reconstruction (CSSR) algorithm [7] to model ε -machine of Korean Composition Stock Price Index (KOSPI) from year 1992 to 2003. From this, we analyzed the result relating to efficient market hypothesis (EMH). 2. Principles 2.1. Information theory Claude Shannon first suggested quantity called the entropy [ ] 2 Pr( )log Pr( ) x H X x =− x of a discrete random variable X with a probability mass function Pr( ) x , which is the intuitive notion of measuring information [8]. Let A be a countable set of symbols of time series and let S be a random variable for A , and s is its realization. If a block of string with L consecutive variables is denoted as 1 , , L L S S S = , then Shannon entropy of length can be defined as L 1 1 2 1 () Pr( , , )log Pr( , , ) L L L s A s A HL s s s s =− " " " . (1) As block length L increases, it can be assumed that increases since there is more information content among the variables. So, entropy density is defined as () HL [ ] lim ( 1) () L h HL H µ →∞ + L . (2) For the finite-length L the entropy density is defined as, ( ) ( ) ( 1), 1, 2, h L HL HL L µ = . (3) The entropy density shows how random the next symbol is [9]. 2.2. ε -machine An infinite string S I can be divided into two semi-infinite halves, i.e., a future S and a history S G H . A causal state is defined as a set of histories that have the same conditional distribution for all the futures. ε is a function that maps each history to the sets of histories, each of which corresponds to a causal state. ' ' () { | ( | ) ( | '), , , }. L L L L L L s s PS s S s PS s S s s S s SL ε + = = = = = = G H G H H H G H G G H G H Z H (4) The transition probability T denotes the probability of generating a symbol a when making the transition from state S to state () a ij i j S [9,10]. The combination of the function ε from histories to causal states with the labeled transition probabilities is called the () a ij T ε -machine [10], which represents a computational model underlying the given time series. 2.3. Statistical complexity Given the ε -machine, statistical complexity is defined as C 2 { } Pr( )log Pr( ) i i i µ ≡− S S S . (5) This quantity measures the minimum amount of historical information required to make optimal forecasts of bits [9,11]. The logarithm of the number of causal states called the topological complexity is defined as *These authors contributed equally.