A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model Eduardo R. Rodrigues Celso L. Mendes Philippe O. A. Navaux Jairo Panetta Alvaro Fazenda Laxmikant V. Kale Institute of Informatics Center for Weather Forecast and Science and Technology Department Parallel Programming Laboratory Federal University of Climate Studies - INPE Federal University of Sao Paulo University of Illinois at Rio Grande do Sul Cachoeira Paulista - Brazil Sao Jose dos Campos - Brazil Urbana-Champaign Porto Alegre - Brazil panetta@cptec.inpe.br alvaro.fazenda@unifesp.br Urbana - USA {errodrigues, navaux}@inf.ufrgs.br {cmendes, kale}@illinois.edu Abstract Among the many reasons for load imbalance in weather forecasting models, the dynamic imbalance caused by lo- calized variations on the state of the atmosphere is the hard- est one to handle. As an example, active thunderstorms may substantially increase load at a certain timestep with re- spect to previous timesteps in an unpredictable manner – after all, tracking storms is one of the reasons for running a weather forecasting model. In this paper, we present a com- parative analysis of different load balancing algorithms to deal with this kind of load imbalance. We analyze the im- pact of these strategies on computation and communication and the effects caused by the frequency at which the load balancer is invoked on execution time. This is done with- out any code modification, employing the concept of proces- sor virtualization, which basically means that the domain is over-decomposed and the unit of rebalance is a sub-domain. With this approach, we were able to reduce the execution time of a full, real-world weather model. 1. Introduction Currently, there is an increasing demand for higher reso- lution weather forecasting simulations. Some weather fore- cast centers are already running models at resolutions of a few kilometers and those are soon expected to increase fur- ther. However, increasing resolution is not just a matter of running the same model code with a finer mesh. As resolu- tion increases, the executed code changes to simulate new This work was partially supported by grants from the National Coun- cil for Scientific and Technological Development (CNPq-Brazil) and from the US Dep.Energy (#DE-SC0001845). Our tests used NSF’s TeraGrid machines, under grants TG-ASC050039N and TG-ASC050040N. The au- thor Eduardo R. Rodrigues was supported by the Brazilian Ministry of Education - CAPES, grant 1080-09-1. phenomena that were previously in a sub-grid scale. Higher resolution allows the representation of localized phenom- ena that cannot be explicitly treated at larger scales. One example is small to medium scale cloud formation, which is treated by statistical methods at scales larger than the cloud itself and by explicit methods at finer scales. A concrete instance of this fact is cumulus convection. At lower resolution, this phenomenon is usually parameter- ized [8]. Meanwhile, at resolutions of a few kilometers, it is possible to use cloud microphysics. This component is con- cerned with the formation, growth and precipitation of rain- drops and snowflakes. This atmospheric process does not have horizontal data dependences, but it may suffer from load imbalance. Indeed, it is well known that thunderstorms cause this problem. Other sources of load imbalance are chemical and biological processes, such as those involved with burning of biomass. Therefore, as a consequence of increasing resolution and complexity, weather forecast models face load imbalance. There has been some research on the usage of load bal- ancing strategies in meteorological models, but virtually no production code has this feature. The reason is that it is hard to implement load balancing in legacy codes. Conse- quently, comparing load balancing algorithms in the context of weather models is difficult. In this paper, we take a different approach: we use the concept of processor virtualization. Instead of inserting the load balancer into the application code, we use a virtual- ized implementation of MPI to decouple the load balanc- ing strategy from the model itself. The domain is over- decomposed in more sub-domains than physical processors and each sub-domain is assigned to a “virtual processor”. Each physical processor handles a set of virtual processors. Load imbalance is addressed by moving virtual processors from overloaded physical processor to underloaded ones. In this virtualized environment, we are able to compare different algorithms to deal with load imbalance of a real-