IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY2009 1823
Index Policies for Resource Allocation
in Wireless Networks
Nomesh Bolia and Vidyadhar Kulkarni
Abstract—We consider the problem of resource allocation for
data transfer between the base station and the users within a
cell of a wireless telecommunication network with infinite data
queues for each user. The aim is to study the tradeoff between the
conflicting objectives of maximizing the system throughput and
the quality of service (QoS) to an individual user. Using a policy
improvement approach based on Markov decision processes, we
develop an intuitive and easy-to-implement index policy. We also
demonstrate its superior performance over the existing propor-
tional fair metric algorithm through simulation experiments.
Index Terms—Data communication, index policies, Markov
decision processes (MDPs), resource allocation, scheduling.
NOMENCLATURE
N Total number of users.
u Label for the users, u =1, 2,...,N .
R
n
u
Channel rate of user u.
R
n
Vector [R
n
u
: u =1, 2,...,N ].
Q
u
Exponentially filtered average data rate
updated according to (1).
τ Key parameter of the PFA; acts as a
damping coefficient in (1); the PFA
throughput increases as τ decreases.
X
n
u
State of user u at time n.
M Number of states in the state space
of the DTMC {X
n
u
,n ≥ 0} for u =
1, 2,...,N .
P
u
Transition probability matrix of the
Markov chain {X
n
u
,n ≥ 0}; has ele-
ments [p
u
i
u
,j
u
].
X
n
=[X
n
1
,...,X
n
N
] State vector of all users.
i =[i
1
,i
2
,...,i
N
] Realized value of state vector X
n
.
Y
n
u
“Starvation age” (or “age”) of user u at
time n.
Y
n
=[Y
n
1
,...,Y
n
N
] Age vector at time n.
t =[t
1
,t
2
,...,t
N
] Realized value of age vector Y
n
.
v(n) User served in the nth time slot.
Ω State space of the DTMC {X
n
u
,n ≥ 0};
same for all users u =1, 2,...,N .
Manuscript received October 10, 2008; revised March 6, 2008, June 27,
2008, and July 31, 2008. First published September 3, 2008; current version
published April 22, 2009. The review of this paper was coordinated by
Dr. H. Jiang.
The authors are with the Department of Statistics and Operations Research,
The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
(e-mail: nomesh@unc.edu; vkulkarn@email.unc.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVT.2008.2005101
r Constant vector of data rates =
[r
1
,r
2
,...,r
M
]; when X
n
u
= k,
R
n
u
= r
k
.
D
l
(y) Cost of not serving user l of age y in
slot n.
V
T
(i, t) Optimal reward starting from state
[X
0
,Y
0
]=[i, t] at time 0 over time
periods 0, 1, 2,...,T .
g Long-run average throughput.
w(i, t) Bias function starting in state (i, t).
q Initial policy vector =[q
1
,q
2
,...,q
N
].
g
q
Constant g under initial policy q.
w
q
(i, t) Bias function w(i, t) under initial
policy q.
π
u
=[π
u
1
,...,π
u
M
] Steady-state distribution of the Markov
chain {X
n
u
: n ≥ 0}.
φ
u
(q
u
) Long run cost per slot for user u under
policy q.
A
u
Mean reward earned by user u if served
in every slot.
K
u
Parameter of the LIP for user u so that
D
u
(n)= K
u
n.
I
u
(i, t) Index for user u in state (i, t).
L
q
Lagrangian that is used to compute the
optimal initial policy.
θ Lagrangian multiplier for optimizing
L
q
.
B Long run expected throughput per time
slot.
ζ Long run expected starvation age of a
user.
ρ
d
Long run probability that a user is
starved for more than d time slots.
ˆ
B Estimate of B that is obtained from the
simulation.
ˆ
ζ Estimate of ζ that is obtained from the
simulation.
ˆ ρ
d
Estimate of ρ
d
that is obtained from the
simulation.
K Constant K
u
assumed the same (K)
for all users u.
N (t) Number of users at time t in the cell in
the dynamic case.
λ Arrival rate of users in the dynamic
cell.
a Sojourn time of a user is exponentially
distributed with mean a in a dynamic
cell.
0018-9545/$25.00 © 2008 IEEE