IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. III (May-Jun. 2016), PP 18-25 www.iosrjournals.org DOI: 10.9790/0661-1803031825 www.iosrjournals.org 18 | Page Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms Deepak A. Vidhate 1 , Dr. Parag Kulkarni 2 1 (Research Scholar, Department of Computer Engineering, College of Engineering, Pune, India) 2 (EKLaT Research, Shivajinagar, Pune, Maharashtra, India) Abstract:The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. In such a case, machine learning program should be able to assess the goodness of policies and learn from past good action sequences to be able to generate a policy. A multi-agent environment is one in which there is more than one agent, where they interact with one another, and further, where there are restrictions on that environment such that agents may not at any given time know everything about the world that other agents know. Two features of multi-agent learning which establish its study as a separate field from ordinary machine learning. Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi-agent systems. Multiagent learning model is given in this paper. Two multiagent learning algorithms i. e. Strategy Sharing & Joint Rewards algorithm are implemented. In Strategy Sharing algorithm simple averaging of Q tables is taken. Each Q-learning agent learns from all of its teammates by taking the average of Q-tables. Joint reward learning algorithm combines the Q learning with the idea of joint rewards. Paper shows result and performance comparison of the two multiagent learning algorithms. Keywords: Joint Rewards, Multiagent, Q Learning, Reinforcement Learning, Strategy Sharing I. Introduction Consider the example market chain that has hundreds of stores all over a country selling thousands of goods to millions of customers. The point of sale terminals record the details of each transaction i.e. date, customer identification code, goods bought and their amount, total money spent and so forth. This typically generates gigabytes of data every day. What the mark et chain wants is to be able to predict who are the likely customers for a product. Again, the algorithm for this is not evident; it changes over time and by geographic location. If stored data is analyzed and turned into information then it becomes useful so that we can make use of an example to make predictions. We do not know exactly which people are likely to buy this product, or another product. We would not need any analysis of the data if we know it already. But because we do not know, we can only collect data and hope to extract the answers to questions from data. We do believe that there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data for example, customer behavior - we know that it is not completely random. People do not go to markets and buy things at random. When they buy beer, they buy chips; they buy ice cream in summer and spices for Wine in winter. There are certain patterns in the data. We may not be able to recognize the process completely, but still we can construct a good and useful approximation. That approximation may not explain everything, but may still be able to account for some part of the data. Though identifying the complete process may not be possible, but still patterns or regularities can be detected. Such patterns may help us to understand the process, or make predictions. Assuming that the near future will not be much different from the past and future predictions can also be expected to be right. There are many real world problems that involve more than one entity for maximization of an outcome. For example, consider a scenario of retail shops in which shop A sales clothes, shop B sales jewelry, shop C sales footwear and wedding house D. In order to build a single system to automate (certain aspects of) the marketing process, the internals of all shops A, B, C, and D can be modeled. The only feasible solution is to allow the various stores to create their own policies that accurately represent their goals and interests. They must then be combined into the system with the aid of some of the techniques. The goal of each shop is to maximize the profit by an increase in sale i.e. yield maximization. Different parameters need to be considered in this: variation in seasons, the dependency of items, special schemes, discount, market conditions etc. Different shops can cooperate with each other for yield maximization in different situations. Several independent tasks that can be handled by separate agents could benefit from cooperative nature of agents. Another example of a domain that requires cooperative learning is hospital scheduling. It requires different agents to represent the regard of different people within the hospital. Hospital employees have a different outlook. X-ray operators may want to maximize the throughput on their machines. Nurses in the