Learning equilibria in constrained Nash-Cournot games with misspeciﬁed demand functions Hao Jiang, Uday V. Shanbhag and Sean P. Meyn Abstract— We consider a constrained Nash-Cournot oligopoly where the demand function is linear. While cost functions and capacities are public information, ﬁrms only have partial information regarding the demand function. Speciﬁcally, ﬁrms either know the intercept or the slope of the demand function and cannot observe aggregate output. We consider a learning process in which ﬁrms update their proﬁt-maximizing quantities and their beliefs regarding the unknown demand function parameters, based on disparities between observed and estimated prices. A characterization of the mappings, corresponding to the ﬁxed point of the learning process, is provided. This result paves the way for developing a Tikhonov regularization scheme that is shown to learn the correct equilibrium, in spite of the multiplicity of equilibria. Despite the absence of monotonicity of the gradient maps, we prove the convergence of constant and diminishing steplength distributed gradient schemes under a suitable caveat on the starting points. Notably, precise rate of convergence estimates are provided for the constant steplength schemes. I. I NTRODUCTION The Nash solution concept [7] has been extensively an- alyzed and applied in economics, engineering and applied sciences and ﬁnds relevance in the examination of strategic behavior in noncooperative games. In such settings, the Nash equilibrium is a tuple of strategies from which no player can proﬁt from unilaterally deviating. In this paper, we consider a deterministic Nash-Cournot game, in which a common homogeneous commodity is being produced by several ﬁrms and its price is speciﬁed completely by a function of the aggregate output. In such a game, the ith player solves Opt(x −i ), deﬁned as min f i (x; θ)   c i (x i ) − p(X; θ)x i  subject to x i ∈ K i , where x  (x 1 ,...,x N ) T , x i denotes the output of ﬁrm i, c i (·) denotes ﬁrm i’s cost function, and K i  [0,Cap i ] with Cap i being the capacity of ﬁrm i. The price function of the commodity, denoted by p(X; θ), is deﬁned as p(X; θ)  a ∗ − b ∗ X, where X = ∑ N i=1 x i and θ =(a ∗ ,b ∗ ). The associated Nash- Cournot equilibrium is given by a tuple x ∗ =(x ∗ i ) N i=1 where x ∗ i ∈ SOL(Opt(x ∗ −i )) for i =1,...,N, SOL(Opt(x ∗ −i )) denotes the solution of Opt(x −i ) and x −i =(x j ) j=i . Jiang and Shanbhag are with the Department of Industrial and Enterprise Systems Engineering while Meyn is in Department of Electrical and Computer Engineering, both at the University of Illinois, Urbana IL 61801. They are contactable at Email: {jiang23,udaybag,meyn}@illinois.edu. This work has been supported by DOE award DE-SC0003879. Cournot models predate the Nash solution concept and a host of variants have been analyzed [8], [9]. An oft- used assumption in game-theoretic models is one which requires that player payoffs are public knowledge and every player is able to forecast the choices of his adversaries. As noted by Kirman [5], a ﬁrm’s information sets may be incomplete as manifested by a regime where ﬁrms have imperfect information of the payoffs of their adversaries. In a Cournot setting, ﬁrms may have an incorrect speciﬁcation of the demand function. Naturally, ﬁrms can ascertain that their estimates differ from observations, leading to an adjustment process. In effect, ﬁrms learn the parameters of the game while participating in the game. Our work is inspired by a series of papers by Szidarovszky, Bischi and their coauthors [2], [3], [10] where ﬁrms com- peting in a Nash-Cournot attempt to learn a parameter of the demand function while playing the game. In [1], in an unconstrained regime with linear costs, the authors examine the stability of learning the equilibrium and one of the unknown parameters of θ (either a ∗ or b ∗ ). In particular, they consider two cases: (Case 1): The slope b ∗ is known, but a ∗ is unknown. (Case 2): a ∗ is known, but the slope b ∗ is unknown. It is shown that this process is globally stable for case 1 and unstable when considering case 2. In this paper, we consider the learning of equilibria when one component of θ is unknown and the aggregate output X is unobservable by the ﬁrms. In particular, if b ∗ and X are unknown, then our goal lies in developing algorithms that construct a sequence z k =(x k ,b k ) such that lim k→∞ z k = z ∗ , where z ∗ =(x ∗ ,b ∗ ). Broadly speaking, our focus is on constrained Nash- Cournot problems; such an extension is not a trivial one in that gradient-based learning now involves introduces the use of a projection operator. In such a regime, we prove that the mappings associated with the variational problems are P and P 0 maps for cases 1 and 2, respectively. Notably, while such a variational problem has a unique solution in the context of Case 1 while such uniqueness cannot be claimed when learning b ∗ (Case 2). Despite this lack of unique- ness, we develop a Tikhonov regularization scheme that is guaranteed to converge to the correct equilibrium, under suitable conditions. The convergence of standard gradient- based distributed schemes cannot be immediately claimed since the mappings are not monotone (but admit a weaker 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 2011 978-1-61284-799-3/11/$26.00 ©2011 IEEE 1018