Criminal Network Mining by Web Structure and Content Mining JAVAD HOSSEINKHANI 1 , SURIAYATI CHAPRUT 1 , HAMED TAHERDOOST 2 1 Advanced Informatics School Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia 2 Department of Computer Engineering Islamic Azad University, Semnan Branch, Semnan, Iran 1 jhkhani@gmail.com, 1 suria@ic.utm.my, 2 hamed.taherdoost@gmail.com Abstract: - Criminal web data provide unknown and valuable information for Law enforcement agencies continuously. The digital data which is applied in forensics analysis includes pieces of information about the suspects’ social networks. However, there is challenging issue with regard to analysing these pieces of information. It is related to the fact that an investigator has to manually extract the useful information from the text in website and then establish connection between different pieces of information and categorise them into a structured database with which the set becomes ready to use various criminal network analysis tools for examination. It is believed that such process of preparing data for analysis which is done manually is not efficient because it is likely to be affected by errors. Besides, since the quality of resulted analysed data depends on the experience and expertise of the investigator, its reliability is not constant. In fact, the more experienced is an operator, the better result is gained. The main objective of this paper is to address the procedure of investigating the criminal suspects of forensic data analysis which cover the reliability gap by proposing a framework. Key-Words: - Crime Web Mining, Terrorist Network, Criminal Network, Social Network, Forensics Analysis, Framework. 1 Introduction Unknown and valuable information are always provided by criminal web data for Law enforcement agencies. The analysis of vast capacities of comprehensive criminal web data is very complicated in an area over periods of time and that is one of the most significant tasks for law enforcement. Crimes may be as extreme as murder and rape where advanced analytical methods are required to extract useful information from the data Web mining comes in as a solution [1, 2]. In many illegal situations, suspects have possession of computers including notebooks, desktops and smart phones which are the main aim of criminal attack and have important information about social networks of the suspect. FBI Regional Computer Forensics Laboratory (RCFL) has been done 6000 researched from 689 law execution organizations against the United States through a year in the United States. In 2009, the amount data of these researches reached to 2334 Terabytes (TB) that is two times more than the amount in 2007. However, better resources are required to promote and increase demands and help the investigators process to collect data legally [15]. September 11th has called the attention of the American public for instance on the value of information collected from within terrorist cells. At least, a portion of these terroristic activities is online [4]. Most collected digital evidence is often textual such as e-mails, chat logs, blogs and web pages. The data is usually unstructured, demanding the investigator to use novel techniques to extract information from them. The task of data entry is manual which becomes laborious. Depending on the collector’s expertise the completeness of information may vary and usually the criminal can hide whatever information he may desire [15]. There are many applications for crawling on the Web. One is surfing on the Internet and visiting web sites, it can help a user to notify when new information updated. Wicked applications are also exist for crawlers such as the spammers or theft attackers who use the email addresses to collect personal information. However, supporting the search engines are the most common use of crawlers. Actually, the main clients of Internet bandwidth are crawlers that help search engines to gather pages and build their indexes for example, proficient universal crawlers designed for research engines such as Google, Yahoo and MSN to collect all pages regardless to the content. Other crawlers are called preferential crawlers who are attempt to Advances in Remote Sensing, Finite Differences and Information Security ISBN: 978-1-61804-127-2 210