Paper—Speech Synthesis for Gender Classification Speech Synthesis for Gender Classification https://doi.org/10.3991/ijes.v5i1.6690 Kawther A. Al-Dhlan University of Hail, Hail, Kingdom of Saudi Arabia kawthar.al.dahlan@gmail.com Abstract—This paper presents a gender identification system to be used for call forwarding in health related communications. The system listens to the caller then using speech synthesis, image processing, and linear support vector machine SVM identifies either he or she is a male or a female. This solution is imperative in a conservative country such as the Kingdom of Saudi Arabia in order to forward the call to a male or female practitioner. The originality of the approach is that no transcription is used to learn SVM models. To identify the gender of the caller, the trained SVM model of the reference pieces are com- pared to transcripts of the audio frequency record and are using the Levenshtein distance. For the identification of gender, we obtain an accuracy rate of 94% on a test flow containing 449 pieces of speech clips. Keywords—Linear SVM; Machine Learning ;spectrogram 1 Introduction Audio content identification consists of retrieving metadata (artist, album name, song name, advertising name, etc.) from an unknown audio clip. There are many potential applications for audio identification, the most popular being automatic radio stream monitoring and the identification of an unknown audio clip captured by a mo- bile device. Manually performing the task of audio identification is rather tedious and slow. To address this problem, there are two main approaches: audio tattooing and fingerprint extraction. The audio tattoo consists of hiding the information to be identi- fied (artist name, album ...) in the audio document. The aim of this approach is to inject the desired information without altering the audio quality of the document. In the next step, a signature is extracted from the un- known content and compared to the references’ fingerprints stored in a database. An acoustic fingerprint is a compact presentation of the audio content. We are interested in methods based on the extraction of audio fingerprints, which are more suitable for the automatic monitoring of radio broadcast. The audio identification by fingerprint extraction consists of two modules: a fingerprint extraction module and a comparison module. The first step in an audio identification system based on fingerprint extrac- tion is the creation of a fingerprint base from a reference database. The reference database contains the audio documents (music, advertisements, jingles) that the sys- iJES ‒ Vol. 5, No. 1, 2017 67