SSRG International Journal of Computer Science and Engineering (SSRG-IJCSE) – volume 3 Issue 5–May 2016 ISSN: 2348 – 8387 www.internationaljournalssrg.org Page 46 Single Agent Learning Algorithms for Decision making in Diagnostic Applications Deepak A. Vidhate #1 , Dr. Parag Kulkarni *2 # Research Scholar, Department of Computer Engineering, College of Engineering, Pune, India * EKLaT Research Lab, Shivajinagar, Pune, Maharashtra, India Abstract — The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. To be able to generate a policy the machine learning programs should able to assess the quality of policies and learn from past good action sequences. Learning is the basic capacity of intelligent agents. An agent changes its behaviour based on its previous experiences through learning. An intelligent agent must be formalized by knowledge and be able to act on this knowledge. In many single-agent systems for learning the policy of an agent in uncertain environments, the reinforcement learning techniques have been applied successfully. Many existing single- agent models for sequential decision making are derived from a general model and are distinguished by assumptions. Q-learning algorithms are used for this purpose. Single agent learning model is given in this paper. Four single agent reinforcement learning algorithms are implemented and results are compared. Single agent Q-learning Algorithm and Sarsa Learning Algorithm gives some results for the problem. However adding eligibility traces in single agent learning algorithms i.e. Q(λ) learning and Sarsa(λ) learning gives performs better than the previous algorithms. The paper shows the results of all four algorithms and performance comparisons among them. Keywords — Q-learning, Reinforcement learning, Sarsa Learning, Single Agent I. INTRODUCTION Consider the example market chain that has hundreds of stores all over a country selling thousands of goods to millions of customers. The point of sale terminals record the details of each transaction i.e. date, customer identification code, goods bought and their amount, total money spent and so forth. This typically generates gigabytes of data every day. What the market chain wants is to be able to predict who are the likely customers for a product. Again, the algorithm for this is not evident; it changes over time and by geographic location. If stored data is analyzed and turned into information then it becomes useful so that we can make use of an example to make predictions[1]. We do not know exactly which people are likely to buy this product, or another product. We would not need any analysis of the data if we know it already. But because we do not know, we can only collect data and hope to extract the answers to questions from data. We do believe that there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data – for example, customer behavior - we know that it is not completely random. People do not go to markets and buy things at random. When they buy beer, they buy chips; they buy ice cream in summer and spices for Wine in winter. There are certain patterns in the data. We may not be able to recognize the process completely, but still we can construct a good and useful approximation. That approximation may not explain everything, but may still be able to account for some part of the data. Though identifying the complete process may not be possible, but still patterns or regularities can be detected. Such patterns may help us to understand the process, or make predictions. Assuming that the near future will not be much different from the past and future predictions can also be expected to be right. There are many real world problems that involve more than one entity for maximization of an outcome. For example, consider a scenario of retail shops in which shop A sales clothes, shop B sales jewelry, shop C sales footwear and wedding house D. In order to build a single system to automate (certain aspects of) the marketing process, the internals of all shops A, B, C, and D can be modeled. The only feasible solution is