IEEETRANSACTIONSONPOWERSYSTEMS,VOL.20,NO.1,FEBRUARY2005 525 Combining a Stability and a Performance-Oriented Control in Power Systems MevludinGlavic, Member, IEEE,DamienErnst, Member, IEEE,and LouisWehenkel, Member, IEEE Abstract—This paper suggests that the appropriate combination of a stability-oriented and a performance-oriented control technique is a promising way to implement advanced control schemes in power systems. The particular approach considered combines control Lyapunov functions (CLF) and reinforcement learning. The capabilities of the resulting con- trollerareillustratedonacontrolprobleminvolvingathyristor-controlled series capacitor (TCSC) device for damping oscillations in a four-machine power system. Index Terms—Control Lyapunov functions, power system stability con- trol, reinforcement learning. I. INTRODUCTION The concept of control Lyapunov functions (CLF) provides a pow- erfultoolforstudyingstabilizationproblems[1].Mostnonlinearcon- trol design methods based on CLF provide strong guarantees of sta- bility but do not directly address important issues of control perfor- mances. ReinforcementLearning(RL)emergesasanattractivelearningpar- adigm that offers a panel of methods that allow controllers to learn a goal-oriented control law from interactions with a system or its simu- lationmodel[2],[3]butdonotprovidestabilityguaranteeswhenem- ployed online. Inprinciple,afeedbackcontrolsystemshouldtrytooptimizesome mix of stability and performance [4], [5], and this paper suggests a combinationofRLandCLFasawaytoimplementanadvancedcon- trol scheme for the system stability control. Results of controlling a thyristor-controlledseriescapacitor(TCSC)todamppowersystemos- cillations are presented. II. RL AND CLF:ASHORT DESCRIPTION RLisageneralalgorithmicapproachtosolvestochasticoptimalcon- trol problems by trial and error. Dynamic programming is the under- lyingformalframework[2].AnRLagent(controller)interactswithits environment (system) as follows [2], [3]: At each discrete time step, the agent receives (through measurements) , which is a representa- tion of the system state, and the agent selects an action from the set of actions available in the state. As a consequence of taking the action,theagentreceives(throughmeasurements)anumericalreward that the agent tries to maximize (minimize) over time. In the usual setting, the discounted sum of rewards is to be maximized. A model-based RL method, known as prioritized sweeping [2], is used here. IntheconceptofCLF,foranonlinearsystem ,onese- lectsaLyapunovfunctioncandidate andﬁndsastabilizingfeed- backcontrollaw thatrenders negativedeﬁnite.Inshort,the CLFapproach,assuminga hasbeenfound,allowsthesearchfor stabilizinginputs.Theresultsfrom[6]onapplicationofaCLFconcept tocontrolaTCSCdevicearelargelyfollowed. ManuscriptreceivedJuly2,2004.Paperno.PESL-00143-2003. The authors are with the Electrical Engineering and Computer Science De- partment, University of Liège, 4000 Liège, Belgium (e-mail: glavic@monte- ﬁore.ulg.ac.be; ernst@monteﬁore.ulg.ac.be; lwh@monteﬁore.ulg.ac.be). Digital Object Identiﬁer 10.1109/TPWRS.2004.841146 Fig.1. Conceptualdiagramoftheproposedcontrol. III. PROPOSED CONTROL SCHEME The learning by trial and error is not justiﬁed when one intends to applyitonlineandwhentheprimaryconcernisthesystemstability.In this case, the “exploration-stability” tradeoff is to be resolved. On the otherhand,intheCLFconcept,assumingaLyapunovfunctioncandi- datehasbeenfound,therearemanycontrollawsthatrenderitsderiva- tivenegativedeﬁnite,andtheproblemistochoosethemostappropriate one. This paper introduces a control scheme that tries to make use of the advantages of the two control techniques and, at the same time, toovercomethementioneddifﬁcultiesinapplyingthemalone(assure safelearninginonlinemodebylimitingavailablecontrolactionstobe stabilizingonesand,atthesametime,toimproveoverstabilizingcon- trols by the minimizing prespeciﬁed cost). TheideaistoemployaRLmethod(inonlinemode)inordertocom- puteanapproximationoftheoptimalsequencecomprisingbasiccon- trol laws derived from the concept of CLF (for stability guarantees of thesequenceofstabilizingcontrollaws,see[7]),asillustratedinFig.1. Fortheparticularcaseconsidered,eachindividualcontrollawisderived asastabilizingcontinuouscontrollawthatrendersacommon(global) Lyapunovfunctioncandidatedecreasing.Theaimofhavingtheactive controlsignalasaninputtoeachofthecontrollawsistoassurea“jump- less”switchfromonecontrollawtoanother.Furthermore,thecontrol schemeasawholeisassumedtorelyonstrictlylocallymeasurablein- formation.AlthoughonlythecombinationoftheRLandCLFconcept isconsideredinthisshortnote,theproposedcontrolschemeisnotlim- ited to these control techniques. Any control with stability guarantees canbecombinedwiththeRLandanyheuristicsearchtechniquecanbe combinedwiththeCLFcontrolinasimilarway. IV. TEST RESULTS Toillustratethecapabilitiesoftheproposedcontrolschemetocon- trolaTCSCaimedtoimprovedampingofpowersystemoscillations,a four-machinepowersystemmodel,asdescribedin[8],isused.Forthe simulation purposes, four linear, continuous with “hard” limits, basic control laws [6] (1) 0885-8950/$20.00 © 2005 IEEE