PRZEGLĄD ELEKTROTECHNICZNY (Electrical Review), ISSN 0033-2097, R. 88 NR 10b/2012 223 Piotr LATOSIŃSKI 1 , Janusz SOSNOWSKI 1 Warsaw University of Technology (1) Monitoring Dependability of a Mail Server Abstract. This paper presents the methodology of monitoring a mail server in order to assess its dependability and detect various anomalies. It is based on collecting and analysing various events stored in system logs and continuous monitoring of system resource usage. A special program has been developed and practically verified to deal with these problems in the server handling mails within the Institute. Streszczenie. Artykuł przedstawia metodykę monitorowania serwera pocztowego ukierunkowaną na ocenę jego wiarygodności oraz detekcję różnych anomalii. Bazuje ona na zbieraniu i analizie logów zdarzeniowych oraz monitorowaniu wykorzystania zasobów systemowych. Opracowano specjalny program, który wykorzystano w monitorowaniu serwera pocztowego Instytutu. (Monitorowanie wiarygodności serwera pocztowego). Keywords: event and performance monitoring, dependability, security, mailing system. Słowa kluczowe: monitorowanie zdarzeń oraz wydajności, wiarygodność, system poczty elektronicznej. Introduction Dependability is a term which combines system features related to reliability, availability, safety, maintainability, etc. Dependability is gaining more and more attention in most computer systems ranging from those used in critical applications (e.g. banking, flight control, e-government) to simple systems used by ordinary people [1,2]. The classical dependability is targeted at handling errors (so as to block their propagation to failures - reactive approach) and scheduling efficient maintenance. In contemporary complex systems we have to take into account also other problems. In particular they relate to on-going system patches and updates, component based system design (including commercial-off-the-shelf components), etc. This leads to the necessity of introducing more general and extended notion of errors/failures i.e. anomaly (abnormal system behaviour) and resilience (the capability of adapting to changes). Hence new proactive approaches in dependability are gaining much interest; they involve some actions before a problem can appear. To resolve these problems we can base on runtime monitoring of the systems. System behaviour can be observed from different perspectives e.g. user, administrator, application, operating system, etc. Most computer systems provide various logs comprising reports on their operation [2,3]. In general we can distinguish event and performance logs. Usually event logs comprise huge amount of data describing specific events which occur in the run time, the status of system components, operational changes related to start-ups or closings of services, configuration modifications, execution errors, etc. Performance logs give some view on system resource usage (CPU, RAM, discs, network, etc.) and load. Most research on monitoring related to system availability and reliability at the hardware or operating system levels ([1,4,5] and references therein). Available monitoring tools are targeted at specific problems e.g. SPAM, cyber-attacks [6]. Flexible system analysis from the application perspective seems to be neglected. We have faced this problem in the case of the mail server used in the Institute. Hence we have adapted monitoring techniques to this perspective. To evaluate the quality of mail services as well as to identify normal operational profiles and anomalies we have developed a special tool SyslogAnalyser which uses uniquely defined regular expressions describing event classes related to various system behaviour images in different logs. Event log analysis has been complemented with performance log analysis based on collected data with standard munin program. Section 2 describes basic features of the mail server including the space of possible monitoring. Section 3 and 4 present the developed monitoring schemes related to event and performance analysis (multidimensional approach), they are illustrated with practical results. Basic features of the mail server Analysing the operation of the mail server it is reasonable to have a broader view on the whole mailing system. Preparing a message for transmission a user composes an e-mail in his mail client agent (MUA – Mail User Agent), within his computer. Then the message is sent with SMTP protocol (Simple Mail Transfer Protocol) to the MTA agent (Mail Transfer Agent), which is most popular in used mail servers. The message is retransmitted further via several MTAs to the destination mail server, which knows where the target MDA agent (Mail Delivery Agent) is located; this agent supports the user mailbox. All the messages delivered to the user's mailbox can be available to the user, however the user has to retrieve them using MUA agent (in his computer) and POP3 (Post Office Protocol ver. 3) or IMAP (Internet Message Access Protocol) protocols from MDA agent in the email server and finally he can read them. In the mail scheme message routes are fully symmetric and the recipient can be the sender and can send a message in the same way. In the case of our Institute we have one mail server (bolek). This server is based on a virtual machine configured within the hardware platform: IBM pSeries 550 (9133-55A) - physical memory: 32 GB; physical processors: 2 x 4-core IBM Power5+ 1.6 GHz; Hitachi disc array 4TB. It is virtualized with IBM PowerVM virtualization (Logical Partitions). The server bolek is running in Logical Partition (LPAR): - allocated memory: 2 GB; allocated CPU: 0.2 processing units (1/5 of 1 physical core); virtual processors: 1 with SMT (simultaneous multithreading) enabled (2 logical CPUs); operating system: AIX 5.3 64-bit; allocated 3 logical discs (50 GB each). The mail server handles all mailboxes of the staff and some students. It performs all actions needed in sending or receiving messages, moreover it performs backups (and in the case of crashes recovery). For this activity it uses CPU, RAM, disc and network resources. This activity can be evaluated indirectly via event and performance logs collected in the server. To assure dependable and resilient operation of the mail server we have to monitor its operation taking into account such issues as detection or prediction of arising problems, evaluation of system resource usage and trends, characteristics of operational profiles, etc. Hence an important issue is collecting and analysing event and performance logs, which is discussed in the sequel. In Unix systems we have many event logs generated by syslog or other programs. We concentrate on the following logs: auth.log (gives the information on correct or erroneous