Image Binarization Using a Local Thresholding with Variable Window Size Approach Costin-Anton Boiangiu, Alexandra Olteanu, Alexandru Ștefănescu, Daniel Rosner, Nicolae Țăpuș and Mugurel Ionuț Andreica Computer Science Department, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Romania costin.boiangiu@cs.pub.ro, {alexandra.olteanu, alexandru.stefanescu1708}@cti.pub.ro, {daniel.rosner, nicolae.tapus, mugurel.andreica}@cs.pub.ro Abstract. In an automatic document conversion system, which builds digital documents from scanned articles, there is a need to perform various adjustments before the scanned image is fed to the layout analysis system. This is because the layout detection system is sensitive to errors when the page elements are not properly identified, represented, denoised, etc. Such an adjustment is the detection of foreground and background or simply called a document image binarization. This paper presents a new idea for treating the com- mon problems which may occur during the binarization phase of the documents, which considers a parameter-free local binarization algorithm which dynamically computes the window size after it sets a threshold for the standard variation value of the window. This proved to offer consistent results for a wide variety of scanned documents consisting of var- ious old newspapers and old library documents in different languages, both handwritten and textual documents. Keywords: Binarization, Local Thresholding, Variable Window, Niblack, Otsu 1 Introduction In recent years, the problem of converting scanned documents into electronic files, especially for large electronic libraries, became more and more studied since the number of degradation categories manifested in the documents to be processed is significant. An automatic content conversion system, based on optical character recognition (OCR), enables operations such as editing, word searching, easy doc- ument storing and multiplication, and the application of a large set of text tech- niques including text-to-speech and text mining to be performed on the digitalized document. In addition, this ensures both a better preservation of original docu- ments, due to minimizing the need for physical use and makes it suitable for au- tomatic data processing or usage under a large spectrum of devices including mo- bile phones or other mobile devices. Content conversion systems involve a number of basic steps to be performed to obtain an output in which we can classify the scanned document elements correct- ly. These steps are represented by: image binarization, which distinguishes be-