Published By:
Blue Eyes Intelligence Engineering &
Sciences Publication
International Journal of Innovative Technology and Exploring Engineering (IJITEE)
ISSN: 2278-3075, Volume-9 Issue-2, December 2019
4224
Retrieval Number: B7647129219/2019©BEIESP
DOI: 10.35940/ijitee.B7647.129219
Abstract: State-of-art speaker recognition system uses
acoustic microphone speech to identify/verify a speaker. The
multimodal speaker recognition system includes modality of
input data recorded using sources like acoustics mic,array mic
,throat mic, bone mic and video recorder. In this paper we
implemented a multi-modal speaker identification system with
three modality of speech as input, recorded from different
microphones like air mic, throat mic and bone mic . we propose
and claim an alternate way of recording the bone speech using a
throat microphone and the results of a implemented speaker
recognition using CNN and spectrogram is presented. The
obtained results supports our claim to use the throat microphone
as suitable mic to record the bone conducted speech and the
accuracy of the speaker recognition system with signal speech
recorded from air microphone get improved about 10% after
including the other modality of speech like throat and bone
speech along with the air conducted speech.
Keywords : Throat Speech,Bone Speech,Speaker
Identification,CNN,Multi-modal Speaker Recognition.
I. INTRODUCTION
Automatic speaker recognition is a way in which the
machines are used to identify/recognize the speaking person
using the speech information.ASR has been a research
interest for many decades; the transition of the technologies
used in ASR is the interesting key factor to make the
research challenging one. The challenges includes in the
feature extraction techniques, speaker modeling and in the
decision making techniques. The features depict the identity
of the speaking person and the modeling the features
involves the representation of the speaker and these models
are used to identify/recognize the speaker. The pipeline of
the ASR system involves Speech data collection, feature
extraction , model training ,model testing and the evaluation
as shown below Fig: 1. The performance of the ASR
depends on techniques and technologies used in each step in
the pipeline. The quality of the speech depends on recording
device and the ambiance of the recording environments
sound vibrations in the air ,whereas the throat pickups the
sound vibrations near the vocal chords and the bone mic
pickups the sound vibrations from the bones like skull. The
AM signals contain the environmental back ground noise.
The TM and BC signals are in-contact with skin/surface, that
are void from the back ground noise.
Revised Manuscript Received on December 05, 2019.
* Correspondence Author
Khadar Nawas K*, SCSE, Vellore Institute of
Technology,Chennai,India. * Correspondence Author
A Nayeemulla Khan, SCSE, Vellore Institute of Technology ,
Chennai,India.
Fig : 1 ASR System Pipeline
sound vibrations in the air ,whereas the throat pickups the
sound vibrations near the vocal chords and the bone mic
pickups the sound vibrations from the bones like skull. The
AM signals contain the environmental back ground noise.
The TM and BC signals are in-contact with skin/surface, that
are void from the back ground noise.
Air Microphone (AM)
The condenser microphone's speech is commonly used in
speech processing studies. These data are referred as Air-
conduction speech, a condenser mic capture the vibrations
through the air medium and convert them to speech signals.
The AM speech is affected by the background noise. The
intelligibility of the AM speech signal get affected the
background noise but the AM speech contains all the
information from the higher to the lower frequencies.
Throat Microphone (TM)
The throat mic uses the piezoelectric transducer to sense the
vocal cord vibration that is positioned near the larynx in
contact with the skin of the throat. It collects the speech
signals transferred by the sound vibrations along with the
larynx tone. Because of its skin contact, it is less prone to the
environment blare compared to the conventional microphone
that senses the differences in air pressure and hence the
environment noise gets captured. The speech of the throat
microphone has less intelligibility due to filtering of the
higher frequency by the skin and muscles at the larynx
region, though it has speech signal with the speaker’s
characteristic features. The spectral features of some sound
units differ from the normal microphone speech’s sound
units. There exits few distinctive spectral features in the TM
speech compared to the AM speech. The presence of such
spectral characteristics in the TM speech could be used to
construct a speaker recognition system [1]. In the TM and
AM voice, the spectral characteristics of certain sounds
emerge to be complimenting one another by nature. The
existence of such complimentary speaker specific spectral
features of both voice signals results in increased efficiency
of speaker recognition systems.
Bone Speech
A CNN based Speaker Recognition System
using an Alternate Bone Microphone
Khadar Nawas K, A Nayeemulla Khan