Monitoring the security of audio biomedical signals communications in
wearable IoT healthcare
Saeid Yazdanpanah
a
, Saman Shojae Chaeikar
b, *
, Alireza Jolfaei
c
a
Department of Computer Engineering, Khorramabad Branch, Islamic Azad University, Khorramabad, Iran
b
Australian Institute of Higher Education, Sydney, Australia
c
College of Science and Engineering, Flinders University, Adelaide, Australia
ARTICLE INFO
Keywords:
Audio security
Audio signal processing
Data hiding
Healthcare data
IoT security
ABSTRACT
The COVID-19 pandemic has imposed new challenges on the healthcare industry as hospital staff are exposed to a
massive coronavirus load when registering new patients, taking temperatures, and providing care. The Ebola
epidemic of 2014 is another example of a pandemic which a hospital in New York decided to use an audio-based
communication system to protect nurses. This idea quickly turned into an Internet of Things (IoT) healthcare
solution to help to communicate with patients remotely. However, it has grabbed the attention of criminals who
use this medium as a cover for secret communication. The merging of signal processing and machine-learning
techniques has led to the development of steganalyzers with very higher efficiencies, but since the statistical
properties of normal audio files differ from those of purely speech audio files, the current steganalysis practices
are not efficient enough for this type of content. This research considers the Percent of Equal Adjacent Samples
(PEAS) feature for speech steganalysis. This feature efficiently discriminates the least significant bit stego speech
samples from clean ones with a single analysis dimension. A sensitivity of 99.82% was achieved for the steg-
analysis of 50% embedded stego instances using a classifier based on the Gaussian membership function.
1. Introduction
Cryptography is the science of making data unintelligible to unau-
thorized people without concealing the existence and other critical in-
formation about the encrypted data such as duration and frequency of
communications, message size, and sender and recipient's identities to
the adversaries [1,2]. Considering the need for concealed communica-
tions, steganography was designed as another security method that can
conceal secrets within the body of digital media, enhance privacy, pre-
vent traffic analysis, and allow the transfer of secrets in an imperceptible
manner. However, considering the applications of steganography, digital
forensic scientists have designed a countermeasure that allows them to
supervise potential secret data exchanges that may be performed by
hackers, terrorists, and lawbreakers [3,4].
Image, video, and audio are the digital media formats that have been
widely used by steganographers as cover media [4,5]. Among these
formats, audio files occupy an outstanding proportion of today's media
transfer, and therefore, create magnificent opportunities for illegal data
transfer without grabbing attention. Notwithstanding, digital forensic
scientists have paid less attention to the audio format, as compared to
images, for developing state-of-the-art audio steganalysis algorithms, and
this domain requires modern steganalyzers that exploit the statistical
characteristics of audio formats [6].
As shown in Fig. 1, steganalysis methods can be classified according
to the signal characteristics.
Multiplicative, phase-encoding, and echo-embedding are the three
targeted steganalysis subclasses. The process of multiplicative subclass
may be described as s[n] ¼ c[n](1 þ m[n]) where c[n], m[n] and s[n] are
cover, secret message, and stego, respectively [7]. Normal steganalyzers
cannot efficiently detect multiplicative steganography, and hence, this
class adds absolute value logarithms of audio samples to the steganalysis
discrimination factor [1]. Phase coding, the second targeted class, is
based on the fact that relative phases between blocks are preserved,
while that amongst consecutive blocks are changed [8].
The third class, namely, echo embedding, holds a bank of kernels in
which one selected kernel convolves with segmented parts of the cover
audio file [9]. Positive-Negative (PN) and Forward-Backward (FB) are
two sample echo embedding kernels.
The universal class can be classified into calibrated and non-
calibrated branches. In the calibrated methods, the challenge is to find
features that have been formed mostly based on the embedded message
instead of the original signals [1]. The first subclass in calibrated methods
* Corresponding author.
E-mail address: s.chaeikar@aih.nsw.edu.au (S. Shojae Chaeikar).
Contents lists available at ScienceDirect
Digital Communications and Networks
journal homepage: www.keaipublishing.com/dcan
https://doi.org/10.1016/j.dcan.2022.11.002
Received 22 December 2021; Received in revised form 27 October 2022; Accepted 1 November 2022
Available online 14 November 2022
2352-8648/© 2022 Chongqing University of Posts and Telecommunications. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an
open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Digital Communications and Networks 9 (2023) 393–399