Parallel generation of samples for simulation techniques applied to Stochastic Automata Networks Ricardo M. Czekster, Paulo Fernandes, Afonso Sales, Dione Taschetto and Thais Webber * Pontif´ ıcia Universidade Cat´ olica do Rio Grande do Sul Av. Ipiranga, 6681 – Pr´ edio 32 – 90619-900 – Porto Alegre, Brazil {ricardo.czekster, paulo.fernandes, afonso.sales, dione.taschetto, thais.webber}@pucrs.br Abstract The Stochastic Automata Networks (SAN) formalism provides a compact and modular description for Marko- vian models. Moreover, SAN is suitable to derive perfor- mance indices for systems analysis and interpretation using iterative numerical solutions based on a descriptor and a state space sized probability vector. Depending on the size of the model this operation is computationally onerous and sometimes impracticable. An alternative method to com- pute indices from a model is simulation, mainly because it simply requires the definition of a pseudorandom generator and transition functions for states that enable the creation of a trajectory. The sampling process can be different for each technique, establishing some rules to collect samples for further statistical analysis. Simulation techniques often demand lots of samples in order to calculate statistically relevant performance indices. We focus our attention on the parallelization of sampling techniques to enhance the generation of more samples in less time, drawing consider- ations about the impact on results accuracy. 1. Introduction Analytical modeling of complex systems is crucial to de- tect error conditions or misbehaviors that arise from differ- ent realities such as bottlenecks, capacity planning prob- lems and scalability issues, to name a few [17, 18, 6, 3]. It is possible to represent a system using stochastic modeling formalisms such as Markov Chains [23] or more structured approaches as Petri Nets [1], Markovian Process Alge- bras [14] or Stochastic Automata Networks (SAN) [19, 5]. We direct our attention to SAN formalism since, like other structured formalisms, it provides simple means to de- * Authors receive grants from Petrobras (0050.0048664.09.9). The or- der of authors is merely alphabetical. Paulo Fernandes is also funded by CNPq-Brazil (PQ 307272/2007-9). Afonso Sales receives grants from CAPES-Brazil (PNPD 02388/09-0). pict components and communications among its elements (synchronous and asynchronous). Since its first definition, SAN was used to create modular representations of sys- tems with local and independent behavior that occasionally interacts and synchronizes with other modules. As other Markovian based formalisms, SAN is used to derive per- formance indices for analysis and interpretation. Briefly, the process multiplies an initial probability vector by a non trivial data structure called descriptor, i.e., a representation of the underlying Markovian transition matrix [11]. SAN uses state-of-the-art algorithms to efficiently compute the probability vector and return the performance indices to its modelers [9, 8]. However, SAN solutions are bounded to specific limits. The current SAN solver, PEPS software tool [4], works with less than 65 million states. Related to Markovian modeling, this limit is very low as sometimes even small sized realities requires massive quantities of states to repre- sent all possibilities. This problem is frequently defined as state space explosion problem and several approaches are used to mitigate its harmful effects. One valid technique is simulation [15, 21], where solution approximations can be successfully derived with an associated computational cost. Simulation itself has several drawbacks such as burning time, initial state definition and halt problems, however, it allows the solution of huge models without storage bounds. This advantage justifies its usage in several contexts where the associated precision must be measured in relation to the amount of samples produced, i.e., a process known as sam- pling. Sequential sampling of large Markovian models has a high computational cost, requiring huge amounts of re- sources to produce the desired number of samples. In light of this problem, recent years witnessed the adoption of par- allel sampling techniques, where the task of producing large amounts of samples are divided by several workers, all syn- chronized by a master entity. This computational model helps producing more samples in less time. Given the set of different simulation techniques available for modelers, we