International Journal of Computer Applications (0975 – 8887) Volume 124 – No.8, August 2015 31 Algorithm to Generate DFA for AND-operator in Regular Expression Mirzakhmet Syzdykov Institute of Information and Computing Technologies 125 Pushkin Str., 050010 Almaty, Republic of Kazakhstan ABSTRACT For the past time a number of algorithms were presented to produce a deterministic finite automaton (DFA) for the regular expression. These algorithms could be divided into what they used as an initial data from which to produce DFA. The method to produce DFA from non-deterministic finite automaton (NFA) by a subset construction could be generalized for extended regular expressions, including intersection, negation and subtraction of the regular languages. In this article the modified algorithm of subset construction is presented; this algorithm produces a unigram DFA for the regular expression with extensions (specifically AND-operator). General Terms Pattern Recognition; Finite Automata; Algorithms. Keywords Algorithm; Deterministic; Automaton; Extension; Intersection; Subset Construction. 1. INTRODUCTION In this paper the common technical and theoretical approach is stated in order to build the finite automaton for extended regular expressions, which include operators like intersection (presented in this article), negation and subtraction. The latter operators (negation and subtraction) will be left for further research and discussion. In this article only cross-product of control vector for a transitions in NFA is defined and a way to use the vector values in subset construction in order to build a DFA for more effective, practical and technical use. This method could be generalized to the extended operations (intersection, negation, subtraction) over automata for pattern matching and subset construction. 2. REGULAR EXPRESSIONS The regular expression is a method to describe verbally and syntactically the set of words, which is further defined as a language, in more readable, understandable and technically simple way. This expression is described by a grammar which in turn consists of a set of rules. Let’s describe in BNF form the regular expressions with some assumptions: 1) the regular expression describes language (finite or infinite set of words) specified by the following grammar; 2) let’s define R and R i as a regular expression, and A as a set of alphabetic symbols from a to z (A = {a, .., z}); 3) let’s define L(R) as a language of a regular expression R. The regular expression then can be defined recursively as (in order of precedence from highest to lowest): 1. R = ε (an empty word, L(R) = {ε}); 2. R = A (a single symbol from alphabet A, L(R) = {a: a A}); 3. R = R + (an infinite language L(R) = L(R) L(RR) L(RRR…)); 4. R = R * = ε | R + (an infinite language or Kleene closure: L(R) = {ε} L(R)); 5. R = R? (a set of words L(R) = {ε} L(R)); 6. R = R 1 R 2 (a set of words L(R) = {ab: a L(R 1 ), b L(R 2 )}); 7. R = R 1 |R 2 (a set of words L(R) = L(R 1 ) L(R 2 )). 2.1 Definition of Extended Regular Expression The extended regular expression is a regular expression supporting one more operation on languages (AND-operator). The AND-operator can be described in regular expression grammar with an additional definition: R = R 1 & R 2 (a set of words L(R): L(R) = L(R 1 ) L(R 2 )). This operation is an intersection of languages produced by sub-expressions (conjunction operator). It has the lowest precedence in regular expression. Extended regular expressions also include subtraction and negation operator which are equal in technical sense, because negation is a subtraction of a closed language and an operand: ~R = А* – R. The subtraction or MINUS-operator can be defined as follows: R = R 1 – R 2 . The language of the subtraction can be also described: L(R) = {w: w L(R 1 ) & w ! L(R 2 )}).