International Journal of Computer Applications (0975 8887) Volume 43No.15, April 2012 44 Expanded Grammar for Detecting Equivalence in Math Expressions Mohammed Q. Shatnawi Department of Computer Information Systems Jordan University of Science and Technology Jordan Marwan T. Alquran Department of Mathematics & Statistics Jordan University of Science and Technology Jordan Fatima M. Quiam Department of Computer Science Jordan University of Science and Technology Jordan ABSTRACT huge amount of different types of information are being posted on the web on a daily basis; therefore, searching capabilities should be provided to help users in finding their requested information. Locating a specific type of information within large repositories of disparate data becomes difficult, if not impossible, without specialized information retrieval systems. Traditional or text-based search engines do not achieve the level of success that users seek in retrieving structured information (e.g. mathematical information). For example, when a user searches for x(y+z) using Google, Google retrieves documents that contain xyz, x+y=z, (x+y+z) =xyz or any other document that contains x, y, and/or z, but not x(y+z) as a standalone expression. The reason behind this is that Google uses the text- based search capabilities/ Algorithms that depend, mostly, on techniques for matching and probabilities of occurrences of x, y, and z. The major obstacle of math search in current text search systems is that those systems do not differentiate between a user query that contains a mathematical expression, and any other query that contains text terms. Therefore, those text-based search systems process mathematical expressions as other texts, regardless of its nature whether being well-structured or not. Here in this context, the text search process will be refined to be applicable in searching for a mathematical expression by implementing a system that is responsible for detecting equivalent math expressions. In fact, more algorithms will be added to the Information Retrieval System in order to make it suitable to do search for a mathematical expression as well as other forms of text. General Terms Information retrieval, Math search. Keywords Math search, expression's equivalent forms, mathematical expression, detecting equivalency, grammar, text-based search systems. 1. INTRODUCTION Web Information consists of two main types [1]: Structured Web Information, which is defined as information ordered in a particular way. Such as mathematical expression, database tables, transactions, math documents, etc. Unstructured Web Information; which is defined as information in random pieces. This type includes bitmap objects such as images, video or audio files, and textual objects such as text, the body of the e-mail message, Web pages, or word processor document. This research will focus on processing mathematical content, which is an example of structured information. Math content is structured in a way that the meaning of certain math expression depends on the structure of that expression. 2. EQUIVALENCE IN MATHEMATICAL EXPRESSION Mathematical expression can be expressed in many and sometimes, infinite number of equivalent forms. For example, 0.5 is the same as ½ mathematically, and x*y is the same as y*x. Searching for x*y using traditional search engines does not retrieve documents that contain the expression y*x, because those search engines use techniques that are not suitable to accurately locate math contents. Therefore; there rises a need for tools that help the users locate the requested math expression and all of its equivalent forms. For example, when a user searches for the expression tan(x), the search engine must retrieve the documents that contain the expression itself, and the documents that contain the expression sin(x)/cos(x), because both expressions are mathematically equivalent. 3. MATHEMATICAL EXPRESSION AS SEARCH TERMS Mathematical expressions are a distinct type of information. Searching the Web for a mathematical expression is not a well-defined process; the result of the search is unexpected most of the time. The inaccurate result is due to the nature of the mathematical expression search process, which is not based on clear and structured rules. In addition, the available techniques are not applicable to search for such expressions but they are designed and tailored to work with normal text along with different kinds of documents (e.g. multimedia documents). 4. SEARCHING FOR MATHEMATICAL EXPRESSION USING TRADITIONAL SEARCH ENGINES Information retrieval systems have been developed since several decades [18]. Mathematical materials such as formulas and equations are symbolic and highly structured. Current search systems do not provide the means of searching such entities or understanding math queries that contain non-alphabetic symbols. A text that is retrieved by current search systems is an unstructured text with no data type definition, and no conceptual definitions as well. Mathematical expressions are well-structured, and the structure conveys their correct interpretation. This is an important reason why current search engines fail in retrieving items that contain mathematical expressions [2]. The same mathematical expression can be represented in many equivalent ways. For this reason, it is not effective to use a thesaurus (i.e. a finite set of concrete per-term definitions) structure in searching for all equivalent expressions. If the current search engines are enhanced to retrieve a specific type of a mathematical expression, they will still fail in retrieving the documents containing