Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. International Journal of Engineering & Technology, 7 (1.5) (2018) 274-278 International Journal of Engineering & Technology Website: www.sciencepubco.com/index.php/IJET Research paper Implementation of modified SARSA learning technique in EMCAP D. Ganesha 1 , Vijayakumar Maragal Venkatamuni 2 1 Bharathiar University, Department of ISE, PVP Polytechnic, Dr.AIT campus Outer Ring Road, Malathahalli, Nagarabhavi, Bangalore 560056, Karnataka, India 2 Department of Computer Science, Research Progress Review Committee[RPRC], Dr. Ambedkar Institute of Technology, Visvesvaraya Technological University, Bengaluru 560056, Karnataka, India *Corresponding author E-mail: ganesh207d@gmail.com Abstract This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm. State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified SARSA learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified SARSA Learning system gets more rewards compare to existing SARSA algorithm. Keywords: Self learning, Cognitive Control, sarsa Learning. 1. Introduction In reinforcement learning (RL) in et.al [1] a agent searches for a perfect control system for a back to back decision issue. Not at all like in managed learning in perspective of the way that numerous down to earth issues (case robot control, framework enhancement and amusement playing) dive in this gathering, creating efficient reinforcement learning techniques is basic to the advance of AI. At the point when the consecutive strategy issue is displayed as a MDP [2], the operator's arrangement is spoken to being a cumulative from all state it might presumably experience to a likelihood flow inside the accessible activities. In few instances, the agent may utilize the environmental surroundings to its experience interacting to calculate a type of the MDP then calculate a policy optimal off-line preparation practices for powerful development [3]. When taking in a model simply is not achievable, the agent can in any case find an ideal strategy using temporal differnce techniques[4]. Every instance the agent responds, the reaction by utilized to upgrade quotes of its action parameter operation, which forecasts the future expected reward reduced will get if it requires confirmed action in a provided state. The behavior strategy, utilized to manage representative in learning process, is various from the computed strategy, whose parameter will be discovered under specific terms, Temporal difference methodology are assured in full converge into the restriction towards attaining the optimal function that is action parameter from which an optimal stategy can very quickly be expressed. In off tategy Temporal difference methodology techniques, for example Q-learning [5].The behavior strategy, utilized to manage the representative in learning process is various from the approximation strategy ,whose parameter will be discovered. The benefit of the proposed method is representative container use a investigative conduct to ensure it gathers information which can be sufficiently diverse and on policy approach, where the estimation and behavior policies are identical, even offers advantages which can be essential. In specific, it offers more powerful convergence guarantees whenever coupled with work guess, since off-strategy methodologies can separate in that example [6] and has now an advantage of off-approach rehearses in its execution where as in online the estimation arrangement, that is iteratively upgraded, can be the approach which is utilized to control its conduct. By strengthening research over the long haul, on-strategy systems can reveal precisely the samewithin the limitation as off strategy techniques. The on stategy is a technique called classic Sarsa [7], which is ben known for its five elements used in its revision guideline: the present state & action st as well as, and instant reward r, plus the future state & action st+1 and at+1. The application of at+1 presents adjustment that is extra the change whenever approximation strategy is stochastic because is normally the truth for on policy practices like Sarsa.Although the algorithm that is ensuing which we call Modified Sarsa, may provide significant benefits over Sarsa, this has never ever been methodically examined and it is may be not trusted in training.