Pakistan J. Zool., vol. 47(6), pp. 1783-1795, 2015. A Systematic Approach to Generate Query Model for DNA Mehwish Nadeem, 1 Muhammad Usman Ghani Khan, 2 Abad Ali Shah 2 and Abdul Nasir 2 1 Embedded Systems and Enterprise Software Solutions Lab, KICS, UET, Lahore Pakistan. 2 University of Engineering and Technology, Lahore, Pakistan Abstract.- The Deoxyribonucleic acid (DNA) molecular structure contains lot of important information which is unique in every human being. Storage, processing and manipulation of DNA’s structural information in a computer system are still in its infancy. For perfect handling of such type of data, we need a Database Management System. Our research work described in this paper arose from the observation that existing data models and query languages (and the database systems realizing them) do not offer sufficient support for the modelling of DNA structure. This is an attempt to find a good representation for DNA structure and solution to the problem representation of DNA is given in form of an object oriented data model by using the idea of bar code technology. It is shown that the chemical structure of DNA can be encoded in bar code which makes storage of DNA structure a lot simpler than existing approaches. In the end we have proposed a query language (DNA-QL) to store, retrieve and manipulate the biological data. To achieve these objectives, we intend to propose a data model to model DNA structures in a uniform fashion. Development of this type of model and query language enables us the development of DBMS for storing biological structural information of DNA. Keywords: Growing database, Query Language, Indexed, DNA, Data model, DNA-QL, Constraints, SQL, biological data, data types, Sub Sequence, Information. INTRODUCTION DNA is an essential part of all living organisms and biologists are researching on determining functions of DNA. There is enormous amount of complex DNA structure data that needs to be stored efficiently. In current times, new species are being invented at a very rapid phase (Mukhtar, 2015). This species invention have brought an explosion in the amount of molecular biological data which is available for research community. The existing data modelling techniques are incapable to model these complex structures. New data modelling techniques are required for modelling the DNA structures. Also, exponential growth of new DNA data from the wet laboratories is contributing difficulties and complexity to the data management (such as data modelling, storage, retrieval and manipulation) and the software development methods for bioinformatics. To overcome these issues, some generic object-oriented data models and DBMS have been developed, such as Orion (Kim et al., 1990), O2 (Deux et al., 1990) ____________________________ * Corresponding author: nasirbhutta1@gmail.com 0030-9923/2015/0006-1783 $ 8.00/0 Copyright 2015 Zoological Society of Pakistan and Iris (Wilkinson et al., 1990). The DMBS are unsuitable to use and manage the non-standard data such as DNA and protein data. There are two types of non-standard data inside a DNA structure and they are given below: Sequence data Core data supports a range of standard data types including string, date and number. Sometimes, however, we need an attribute's value to be a type that is not supported directly. The existing general- purpose OODBs do not have any standard (built-in) data types and biological domain-specific functional operations for biological research (Wang, 2007). In biological data DNA sequence is non-standard data type and its corresponding data. It is a common and current practice to store metadata of each sequence and its data in a relational DBMS. This practice has a serious problem that the relational DBMS do not support approximate and partial sequence matching queries. There is another approach in practice in which the sequence data is stored in a flat file and external indices are created which are processed by ______________________________ Authors’ Contributions: AAS conceived the project and supervised the work; MN, performed the experiments, developed software and tested Codes, AN helped in data collection and analysis; MUGK wrote the article and helped in mathematical formulations.