IEEETRANSACTIONSONPOWERSYSTEMS,VOL.20,NO.1,FEBRUARY2005 525
Combining a Stability and a Performance-Oriented
Control in Power Systems
MevludinGlavic, Member, IEEE,DamienErnst, Member, IEEE,and
LouisWehenkel, Member, IEEE
Abstract—This paper suggests that the appropriate combination of
a stability-oriented and a performance-oriented control technique is a
promising way to implement advanced control schemes in power systems.
The particular approach considered combines control Lyapunov functions
(CLF) and reinforcement learning. The capabilities of the resulting con-
trollerareillustratedonacontrolprobleminvolvingathyristor-controlled
series capacitor (TCSC) device for damping oscillations in a four-machine
power system.
Index Terms—Control Lyapunov functions, power system stability con-
trol, reinforcement learning.
I. INTRODUCTION
The concept of control Lyapunov functions (CLF) provides a pow-
erfultoolforstudyingstabilizationproblems[1].Mostnonlinearcon-
trol design methods based on CLF provide strong guarantees of sta-
bility but do not directly address important issues of control perfor-
mances.
ReinforcementLearning(RL)emergesasanattractivelearningpar-
adigm that offers a panel of methods that allow controllers to learn a
goal-oriented control law from interactions with a system or its simu-
lationmodel[2],[3]butdonotprovidestabilityguaranteeswhenem-
ployed online.
Inprinciple,afeedbackcontrolsystemshouldtrytooptimizesome
mix of stability and performance [4], [5], and this paper suggests a
combinationofRLandCLFasawaytoimplementanadvancedcon-
trol scheme for the system stability control. Results of controlling a
thyristor-controlledseriescapacitor(TCSC)todamppowersystemos-
cillations are presented.
II. RL AND CLF:ASHORT DESCRIPTION
RLisageneralalgorithmicapproachtosolvestochasticoptimalcon-
trol problems by trial and error. Dynamic programming is the under-
lyingformalframework[2].AnRLagent(controller)interactswithits
environment (system) as follows [2], [3]: At each discrete time step,
the agent receives (through measurements) , which is a representa-
tion of the system state, and the agent selects an action from the
set of actions available in the state. As a consequence of taking the
action,theagentreceives(throughmeasurements)anumericalreward
that the agent tries to maximize (minimize) over time.
In the usual setting, the discounted sum of rewards
is to be maximized. A model-based RL method, known as prioritized
sweeping [2], is used here.
IntheconceptofCLF,foranonlinearsystem ,onese-
lectsaLyapunovfunctioncandidate andfindsastabilizingfeed-
backcontrollaw thatrenders negativedefinite.Inshort,the
CLFapproach,assuminga hasbeenfound,allowsthesearchfor
stabilizinginputs.Theresultsfrom[6]onapplicationofaCLFconcept
tocontrolaTCSCdevicearelargelyfollowed.
ManuscriptreceivedJuly2,2004.Paperno.PESL-00143-2003.
The authors are with the Electrical Engineering and Computer Science De-
partment, University of Liège, 4000 Liège, Belgium (e-mail: glavic@monte-
fiore.ulg.ac.be; ernst@montefiore.ulg.ac.be; lwh@montefiore.ulg.ac.be).
Digital Object Identifier 10.1109/TPWRS.2004.841146
Fig.1. Conceptualdiagramoftheproposedcontrol.
III. PROPOSED CONTROL SCHEME
The learning by trial and error is not justified when one intends to
applyitonlineandwhentheprimaryconcernisthesystemstability.In
this case, the “exploration-stability” tradeoff is to be resolved. On the
otherhand,intheCLFconcept,assumingaLyapunovfunctioncandi-
datehasbeenfound,therearemanycontrollawsthatrenderitsderiva-
tivenegativedefinite,andtheproblemistochoosethemostappropriate
one. This paper introduces a control scheme that tries to make use of
the advantages of the two control techniques and, at the same time,
toovercomethementioneddifficultiesinapplyingthemalone(assure
safelearninginonlinemodebylimitingavailablecontrolactionstobe
stabilizingonesand,atthesametime,toimproveoverstabilizingcon-
trols by the minimizing prespecified cost).
TheideaistoemployaRLmethod(inonlinemode)inordertocom-
puteanapproximationoftheoptimalsequencecomprisingbasiccon-
trol laws derived from the concept of CLF (for stability guarantees of
thesequenceofstabilizingcontrollaws,see[7]),asillustratedinFig.1.
Fortheparticularcaseconsidered,eachindividualcontrollawisderived
asastabilizingcontinuouscontrollawthatrendersacommon(global)
Lyapunovfunctioncandidatedecreasing.Theaimofhavingtheactive
controlsignalasaninputtoeachofthecontrollawsistoassurea“jump-
less”switchfromonecontrollawtoanother.Furthermore,thecontrol
schemeasawholeisassumedtorelyonstrictlylocallymeasurablein-
formation.AlthoughonlythecombinationoftheRLandCLFconcept
isconsideredinthisshortnote,theproposedcontrolschemeisnotlim-
ited to these control techniques. Any control with stability guarantees
canbecombinedwiththeRLandanyheuristicsearchtechniquecanbe
combinedwiththeCLFcontrolinasimilarway.
IV. TEST RESULTS
Toillustratethecapabilitiesoftheproposedcontrolschemetocon-
trolaTCSCaimedtoimprovedampingofpowersystemoscillations,a
four-machinepowersystemmodel,asdescribedin[8],isused.Forthe
simulation purposes, four linear, continuous with “hard” limits, basic
control laws [6]
(1)
0885-8950/$20.00 © 2005 IEEE