Decision Making with Hybrid Models: the Case of Collective and Individual Motivations Paulo Trigo* Departamento de Eng. de Electr´ onica e Telecomunica¸ c˜ oes e de Computadores, Instituto Superior de Engenharia de Lisboa, Portugal E-mail: ptrigo@deetc.isel.ipl.pt *Corresponding author Helder Coelho Departamento de Inform´ atica, Faculdade de Ciˆ encias da Universidade de Lisboa, Portugal E-mail: hcoelho@di.fc.ul.pt Abstract: In the aftermath of a large-scale disaster, agents’ decisions range from individual (e.g. survival) to collective (e.g. victims’ rescue or ﬁre extinction) attitudes, thus shaping a 2-strata decision model. However, current decision-theoretic models are either purely individual or purely collective and ﬁnd it diﬃcult to deal with motiva- tional attitudes; on the other hand, mental-state based models ﬁnd it diﬃcult to deal with uncertainty. We describe two hybrid decision models: i) the collective ‘versus’ individual (CvI), which integrates both strata quantitative evaluation of decision mak- ing, and ii) the CvI-JI which extends the CvI model, using the joint-intentions formu- lation of teamwork, to deal with collective mental-state motivational attitudes. Both models are evaluated from an experimental, case study based, outlook that explores the tradeoﬀ between cost reduction and loss of optimality while learning coordination skills in a partially observable stochastic domain. Keywords: multi-agent, distributed and hierarchical decision-making; mental-state based reasoning with uncertainty; hierarchical reinforcement learning. 1 Introduction The agents that cooperate to mitigate the eﬀects of a large- -scale disaster, e.g. an earthquake or a terrorist incident, take decisions that follow two large behavioral classes: the individual (ground) activity and the collective (institu- tional) coordination of such activity. Additionally, agents are motivated to form teams and jointly commit to goals that supersede their individual capabilities. Despite such motivation, communication is usually insuﬃcient to ensure that decision-making is supported by a single and coherent world perspective. In general, the search for a coordination policy that responds to a large-scale disaster is a process beyond individual skills where optimality is non-existent or too expensive to compute (Kitano et al., 1999). However, despite the intuition on a 2-strata (collective and individual) decision process, research on multi-agent coordination often proposes a single model that amalga- mates those strata and searches for optimality within that model. The approaches based on the multi-agent Markov decision process (MMDP) (Boutilier, 1999) are purely collective and centralized, thus too complex to co- ordinate while requiring unconstrained communication. The multi-agent semi-Markov decision process (MSMDP) (Ghavamzadeh et al., 2006), although decentralized, re- quires each individual agent to represent the whole decision space (states and actions) which may become very large, thus causing the individual policy learning to be slow and highly dependent on up-to-date information about the de- cisions of all other agents. The game-theoretic approaches usually require each agent to compute the utility of all combinations of actions executed by all other agents (pay- oﬀ matrix), which is then used to search for Nash equilibria (where no agent increases his payoﬀ by unilaterally chang- ing his policy); thus, when several equilibrium exist, agents may adhere to purely individual policies never being pulled by a collective perspective.