Jurnal Informatika Universitas Pamulang ISSN: 2541-1004 Penerbit: Program Studi Teknik Informatika Universitas Pamulang e-ISSN: 2622-4615 Vol. 6, No. 1, Maret 2021 (202-209) 10.32493/informatika.v6i1.10077 http://openjournal.unpam.ac.id/index.php/informatika 203 Web Harvesting for Data Retrieval on Scientific Journal Sites I Gede Surya Rahayuda 1 , Ni Putu Linda Santiari 2 12 Faculty of Informatics and Computers, Institute of Technology and Business STIKOM Bali, Jalan Raya Puputan Renon No. 86, Denpasar, Bali, Indonesia, 80234 e-mail: 1 surya_rahayuda@stikom-bali.ac.id, 2 linda_santiari@stikom-bali.ac.id Submitted Date: March 27 th , 2021 Reviewed Date: June 02 nd , 2021 Revised Date: June 05 th , 2021 Accepted Date: June 15 th , 2021 Abstract Publishing scientific articles online in journals is a must for researchers or academics. In choosing the journal of purpose, the researcher must look at important information on the journal's web, such as indexing, scope, fee, quarter and other information. This information is generally not collected in one page, but spread over several pages in a web journal. This will be complicated when researchers have to look at information in several journals, moreover, the information in these journals may change at any time. In this research, web harvesting design is conducted to retrieve information on web journals. With web harvesting, information that is spread across several pages can be collected into one, and researchers do not need to worry if the information has changed, because the information collected is the last or updated information. Harvesting technique is done by taking the page URL of the page, starting the source code from where the information is retrieved and end source code until the information stops being retrieved. Harvesting technique was successfully developed based on the web bootstrap framework. The test data is taken from several scientific journal webs. The information collected includes name, description, accreditation, indexing, scope, publication rate, publication charge, template and quarter. Based on tests carried out using black box testing, it is known that all the features made are as expected. Keywords: web harvesting; web mining; parsing; bootstrap; journal 1. Introduction Information technology advancements, especially the internet, makes people choose to store data and everything that is digital on the internet. Online storage media is considered safe and has a large capacity which generally cannot be done when storing data offline on personal devices. Starting from unofficial publications such as file sharing, videos, social media to official publications such as open datasets, government data, conferences, journals and others. Currently, almost all researchers, lecturers, students and other academics choose to publish their scientific papers online in journals. On the internet all information can change quickly. In choosing the journal of purpose, the researcher must look at important information on the journal's web, such as indexing, scope, fee, quarter and other information. This information is generally not collected in one page, but spread over several pages in a web journal. This will be complicated when researchers have to look at information in several journals, moreover, the information in these journals may change at any time. In this study, the author will discuss the use of web harvesting methods for data collection in several scientific journals (Josi et al., 2014). By using this method, researchers can collect important information desired from several journals or journal pages. Researchers also don't need to worry if the information on the web journal changes. Web harvesting is a technique for extracting data and information from a website and then storing it in a specific format (Sahria, 2020)(Chifu & Leţia, 2015). Web harvesting is done by taking the page URL of the page, starting the source code from where the information is retrieved and end source code until the information stops being retrieved. Web harvesting will be made web-based with the addition of a bootstrap front- end framework. The test data used are several scientific journal webs. The information collected is, Name, Description, Accreditation, indexing, scope, publication rate, Publication Charge, Template and Quarter (Johnson & Sieber, 2012)(Josi et al., 2014). The website will be tested using black box method. The author hopes that this research will make it easier for researchers or academics to choose destination journals.