IFAC PapersOnLine 51-6 (2018) 202–207
ScienceDirect
Available online at www.sciencedirect.com
2405-8963 © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
Peer review under responsibility of International Federation of Automatic Control.
10.1016/j.ifacol.2018.07.154
© 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
1. INTRODUCTION
At present, technologies are being developed in the area of
building automation. Reasons for application of these
standardized systems for own housing include the comfort
and safety of inhabitants and economical SH operation. With
regard to comfort, the control elements of building
automation can be adapted to the SH user’s requirements
Hajovsky (2015), (Pies 2018), (Slanina 2018). One of the
possible facilitations in this area is voice control. Controlling
the building in this way presents a great advantage for a
certain group of people, which in particular includes seniors
and disabled persons. Using voice commands, for instance,
one can simply switch the light on and off, control window
blinds or switch on the heating or air-conditioning. Voice
commands can be implemented for any selected operational-
technical function in the building. Voice control, however,
has its drawbacks. Every room in the object has its own
acoustics which essentially distorts the uttered command.
Another and the more serious problem is that various types of
additive interference may occur in some rooms, which
substantially reduce the success rate of recognition of the
uttered commands. This is a problem for persons who are, for
health reasons, unable to manually handle the controller for
e.g. the lighting, blinds, etc. At present, various methods and
applications are used for voice control of the system with
additive noise. The work of Agarwalla (2016) is focused on
the techniques of automatic speaker recognition (ASR) and
addresses learning of machines used for extraction of relevant
samples from a large data volume and their application for
ASR. Asano (2003) proposes a method of speech detection
from multiple sound sources by means of sound and visual
information in the real environment using the Bayesian
network. Czyżewski and Królikowski (2001) solve the
problem of processing digital audio signals by means of
rough neuro hybridization. Besides that, they describe the
application of soft computing methods to reduce non-
stationary noise. Du (2006) deals in his work with methods
based on conventional data processing, which are
computationally challenging and require the knowledge of
specialists for system modeling by means of Neural
Networks with the subsequent utilization for signal or speech
processing, image processing, analysis of data and artificial
intelligence. Genaro (2009) describes the use of artificial
neural networks (ANN) for the modeling of urban noise. He
executed several applications at acoustically different places
in Spain and compared the results with mathematical models.
It was found that the ANN system was able to predict the
occurrence of noise with high precision, which resulted in the
improvement of these models. Gil-Pita (2012) deals with the
utilization of soft computing methods for the creation of
energy-efficient algorithms for binaural hearing aids able to
recognize and separate speech from other undesirable audible
sounds. The work of Kasabov (1998) dealt with fuzzy neural
networks using the methods of structure optimization by
means of a genetic algorithm together with the method of
learning with forgetting for speech recognition on the
phoneme basis. Malcangi and Grew (2015) deal in their work
with the problem of improving automated systems for
automatic speech recognition. Machacek (2011) was dealing
with intelligent adaptive techniques. In this work, the ANFIS
structure (Adaptive Neuro-Fuzzy Inference System) is used
for the suppression of additive noise in the speech signal. The
paper introduces the design, development, and verification of
methodology for the assessment of processing the quality of
the speech signal by means of the PESQ algorithm (standard
ITU-T P.863) within voice control of operational-technical
Keywords: voice control; additive noise, Smart Home (SH); ANFIS, KNX.
Abstract: This paper describes utilization of the ITU-T P.863 standard for assessment of quality of the
speech signal processing within voice control of operational-technical functions in Smart Home (SH) by
means of the PESQ algorithm. To suppress additive noise in the speech signal in the real SH
environment, the ANFIS structure is used within the SH voice control by means of the KNX bus system.
The voice control of operational-technical functions of this communication bus system is a prerequisite
for simpler household management, eliminating the otherwise necessary manual handling of the control
device, particularly for seniors or disabled persons.
Faculty of Electrical Engineering and Computer Science,
VSB-Technical University of Ostrava
17 Listopadu 15, Ostrava 70833 Czech Republic
(jan.vanus@vsb.cz, tomas.weiper.st@vsb.cz, radek.martinek@vsb.cz,
jan.nedoma@vsb.cz, marcel.fajkus@vsb.cz, ludvik.koval@vsb.cz, roman.hrbac@vsb.cz )
J. Vanus, T. Weiper, R. Martinek, J. Nedoma, M. Fajkus, L. Koval, R. Hrbac
Assessment of the Quality of Speech Signal Processing Within Voice Control of
Operational-Technical Functions in the Smart Home by Means of the PESQ
Algorithm