Mining and Analysis of Tandem Repeated Patterns in Oncogenic Sequences involved in Cancer progression Satish Kumar*, Dharminder Kumar*, Ashok Chaudhury** *Deptt. of Computer Science & Engineering ** Centre for Bio & Nano Sciences Guru Jambheshwar University of Science & Technology, Hisar (Haryana) Abstract- Tandem repeat patterns are very useful for biologists as they describes a pattern that helps to determine an individual's inherited traits. Tandem repeats can be very useful in determining parentage. These repeated nucleotides play a very important role to analyse and understand the various disorders available in cancer disease. Various data mining techniques like clustering, association analysis and classification etc. can be used for analysis of these repeated nucleotides. Keyword –Tandem repeats Patterns, BWtrs, Bio- PHP Introduction The biological sequences consist of four nucleotides bases adenine (A), guanine (G), cytosine (C) and thymidine (T). These forms the complete DNA sequence of an organism. Many times these are repeated in a definite order forming a track of repeated units. Each of these units can range from 1-60 nucleotides. These repeated can be divided into two main types micro-satellite and mini- satellites. When 10 to 60 nucleotides are repeated , then the repeats are called mini-satellites and the repeats with fewer nucleotides are called mini- satellite. When exactly two nucleotides are repeated, it is called a dinucleotide repeat (for example: AGAGAG or AG 3 ). When three nucleotides are repeated, it is called a trinucleotide repeat (for example: CAGCAGCAGCAG or CAG 4 ), and abnormalities in such regions can give rise to trinucleotide repeat disorders and when the number is not known or variable, it is refer to as variable number tandem repeat (VNTR)[1,2]. These repeats are of very much importance as they help in determining the parentage of child’s in legal cases, individual inheritance trait can be determined through them and they helps in developing the primers for the sequencing and amplification of biological characters[3]. These repeats also responsible for the particular functions of the proteins codes by the genes having the codon repetition of particular amino-acid, such as the case of DNA binding proteins. A lot of human disorder or diseases are also associated with these repetitive elements diseases such as Huntington's disease [4] and certain forms of Fragile X syndrome [5]. The change in the frequency of particular repeats can result in the development of disease such as cancer[6]. As the genes consist of both coding and non coding regions. The changes in the repeats in coding region are of more importance as that IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 6, No 2, November 2013 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 250 Copyright (c) 2013 International Journal of Computer Science Issues. All Rights Reserved.