Machine Learning and Natural Language Processing in Domain Classification of Scientific Knowledge Objects – Review Samuel Machado and Jorge Oliveira e Sá 1 1 Algoritmi Center, University of Minho, Portugal Abstract. The domain classification of scientific knowledge objects has been continuously improved over the years. Systems that can automatically classify a scientific knowledge object, through the use of artificial intelligence, machine learning algorithms, natural language processing, and others, have been adopted in most scientific knowledge databases to maintain internal classification consistency as well as to simplify the information arrangement. However, the amount of available data has grown exponentially in the last few years and now it can be found in multiple platforms under different classifications due to the implementation of different classification systems. Thus, the process of searching and selecting relevant data in research studies and projects has become more complex and the time needed to find the right information has continuously grown as well. Therefore, machine learning and natural language processing play an important role in the development and achievement of automatic and standardized classification systems that will aid researchers in their research work. Keywords: Natural Language Processing, Machine Learning, Domain Classification, Scientific Knowledge Objects. 1 Introduction The process of searching and selection of relevant data in research studies and projects have become more complex due to the huge amount of available data. In the data search process, researchers may or may not use various filters to restrain the amount of data that the used platform returns. These filters are used to classify the data and rearrange it under certain labels to simplify the search process. During the search process, publication date, document type, and scientific domain are the most common filters applied to search processes and to the data itself. Regarding the scientific domain, this classification method is applied in almost every search, allowing the researcher to specify the scientific field that corresponds to the focus of the search. However, multiple times data can be found under different search platforms with different domains associated. That happens because different platforms use different classification systems which cause data to be labeled differently, or even being possible