IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY2009 1823 Index Policies for Resource Allocation in Wireless Networks Nomesh Bolia and Vidyadhar Kulkarni Abstract—We consider the problem of resource allocation for data transfer between the base station and the users within a cell of a wireless telecommunication network with infinite data queues for each user. The aim is to study the tradeoff between the conflicting objectives of maximizing the system throughput and the quality of service (QoS) to an individual user. Using a policy improvement approach based on Markov decision processes, we develop an intuitive and easy-to-implement index policy. We also demonstrate its superior performance over the existing propor- tional fair metric algorithm through simulation experiments. Index Terms—Data communication, index policies, Markov decision processes (MDPs), resource allocation, scheduling. NOMENCLATURE N Total number of users. u Label for the users, u =1, 2,...,N . R n u Channel rate of user u. R n Vector [R n u : u =1, 2,...,N ]. Q u Exponentially filtered average data rate updated according to (1). τ Key parameter of the PFA; acts as a damping coefficient in (1); the PFA throughput increases as τ decreases. X n u State of user u at time n. M Number of states in the state space of the DTMC {X n u ,n 0} for u = 1, 2,...,N . P u Transition probability matrix of the Markov chain {X n u ,n 0}; has ele- ments [p u i u ,j u ]. X n =[X n 1 ,...,X n N ] State vector of all users. i =[i 1 ,i 2 ,...,i N ] Realized value of state vector X n . Y n u “Starvation age” (or “age”) of user u at time n. Y n =[Y n 1 ,...,Y n N ] Age vector at time n. t =[t 1 ,t 2 ,...,t N ] Realized value of age vector Y n . v(n) User served in the nth time slot. Ω State space of the DTMC {X n u ,n 0}; same for all users u =1, 2,...,N . Manuscript received October 10, 2008; revised March 6, 2008, June 27, 2008, and July 31, 2008. First published September 3, 2008; current version published April 22, 2009. The review of this paper was coordinated by Dr. H. Jiang. The authors are with the Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA (e-mail: nomesh@unc.edu; vkulkarn@email.unc.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVT.2008.2005101 r Constant vector of data rates = [r 1 ,r 2 ,...,r M ]; when X n u = k, R n u = r k . D l (y) Cost of not serving user l of age y in slot n. V T (i, t) Optimal reward starting from state [X 0 ,Y 0 ]=[i, t] at time 0 over time periods 0, 1, 2,...,T . g Long-run average throughput. w(i, t) Bias function starting in state (i, t). q Initial policy vector =[q 1 ,q 2 ,...,q N ]. g q Constant g under initial policy q. w q (i, t) Bias function w(i, t) under initial policy q. π u =[π u 1 ,...,π u M ] Steady-state distribution of the Markov chain {X n u : n 0}. φ u (q u ) Long run cost per slot for user u under policy q. A u Mean reward earned by user u if served in every slot. K u Parameter of the LIP for user u so that D u (n)= K u n. I u (i, t) Index for user u in state (i, t). L q Lagrangian that is used to compute the optimal initial policy. θ Lagrangian multiplier for optimizing L q . B Long run expected throughput per time slot. ζ Long run expected starvation age of a user. ρ d Long run probability that a user is starved for more than d time slots. ˆ B Estimate of B that is obtained from the simulation. ˆ ζ Estimate of ζ that is obtained from the simulation. ˆ ρ d Estimate of ρ d that is obtained from the simulation. K Constant K u assumed the same (K) for all users u. N (t) Number of users at time t in the cell in the dynamic case. λ Arrival rate of users in the dynamic cell. a Sojourn time of a user is exponentially distributed with mean a in a dynamic cell. 0018-9545/$25.00 © 2008 IEEE