Incomplete nucleic acid sequences visualization: A case study in virus sequences Thitiwat Piyatamrong 1, a , Anan Kamolphanus 2,b , Gasydech Lergchinnaboot 3,c , Krittin Suphakarn 4,d , and Chivalai Temiyasathit 5,e* 1,2,3,4,5 International College, King Mongkut’s Institute of Technology Ladkrabang, Chalongkrung Rd., Ladkrabang, Bangkok, Thailand 10520 a s4090009@kmitl.ac.th, b s4090037@kmitl.ac.th, c s4090003@kmitl.ac.th, d s4090001@kmitl.ac.th, e ktchival@kmitl.ac.th Keywords: Incomplete nucleic acid sequence, Reference Sequence, UnitX, Majority voting, Sequences, Visualization. Abstract. Dengue virus (DENV) is one of the most widespread infectious diseases in the world, especially in the South East Asian regions. Transmitting the virus through mosquitoes, Dengue is an infectious viral borne disease. The virus sequences are assembled as series of nucleic acid, making the task of diagnosing virus sequences burdensome. Graphical representations are then proposed to represent Dengue virus to sustain the studies in virus sequences diagnosis. However, graphically representing sequences remained a crucified task especially for the incomplete genome sequences due to the missing nucleic acids. Although a number of studies provide methodologies on virus sequence visualization, in Dengue virus researches, those methodologies provide the visualization solely for complete genome sequences while neglecting the incomplete genome sequences. With the unaccommodating availabilities of research inputs, our study proposes a methodology for graphically representing the incomplete Dengue virus sequences, as well as complete virus sequences, by imputing in the incomplete part of a sequence with created reference sequences. The proposed methodology employs the use of database technology and majority voting technique to create reference sequences for each serotype of Dengue. Experimental results show that incomplete sequences are visualized realistically according to its respective serotype, thus providing flexibilities in Dengue virus researches to compensate incomplete sequences as inputs. Introduction Dengue virus is a major medical concern throughout the tropical regions of the world today. Over 2.5 billion people, 40% of the world's population, are now at risk from dengue virus. According to The World Health Organization (WHO) [1], there are more than 500,000 reported cases of Dengue infections throughout all of Southeast Asia in 2013. Of all the reported cases in this region, 150,000 cases are originated from Thailand. To assist in the reduction of Dengue virus infections rate, many researches are aim to ameliorate the understanding of Dengue sequences, one of which is to provide a better visualization for Dengue virus sequences. A number of studies have proposed various algorithms and tools in sequence visualizations along with analyzing genomic sequences. An example of a sequence visualization tool is Basic Local Alignment Search Tool, also known as BLAST. BLAST is one of the many popular tools for DNA and genomic sequence analysis. Visualizing and analyzing genomic sequences can be categorized into four approaches: Sequential, Fourier Transform (FT), Z-curve, and Base vector approaches. Because of the simplicity for end user in term of visualization, the Base vector approach is widely investigated. In 1983, Hamori and Ruskin [2] used a three dimensional H curve which is one of the based vector approach to represent a DNA sequence. Afterwards, Gates [3] suggested the simpler graphic representation of DNA sequence in two-dimension which represents four nucleotide bases adenine (A), thymine (T), cytosine, and guanine (G) by the unit vector on the Cartesian coordinate system where adenine (A) lies on the