AN ATTENTION-AWARE BIDIRECTIONAL MULTI-RESIDUAL RECURRENT NEURAL
NETWORK (ABMRNN): A STUDY ABOUT BETTER SHORT-TERM TEXT
CLASSIFICATION
Ye Wang
1
, Han Wang
1
, Xinxiang Zhang
2
, Theodora Chaspari
1
, Yoonsuck Choe
1
and Mi Lu
1
1
Texas A&M University, College Station, Texas, 77843, USA
2
Southern Methodist University, Dallas, Texas, 75025, USA
{wangye0523, hanwang, chaspari, choe}@tamu.edu, xinxiang@smu.edu, mlu@ece.tamu.edu
ABSTRACT
Long Short-Term Memory (LSTM) has been proven an effi-
cient way to model sequential data, because of its ability to
overcome the gradient diminishing problem during training.
However, due to the limited memory capacity in LSTM cells,
LSTM is weak in capturing long-time dependency in sequen-
tial data. To address this challenge, we propose an Attention-
aware Bidirectional Multi-residual Recurrent Neural Network
(ABMRNN) to overcome the deficiency. Our model consid-
ers both past and future information at every time step with
omniscient attention based on LSTM. In addition to that, the
multi-residual mechanism has been leveraged in our model
which aims to model the relationship between current time
step with further distant time steps instead of a just previous
time step. The results of experiments show that our model
achieves state-of-the-art performance in classification tasks.
Index Terms— Long Short-Term Memory, recurrent
neural network, attention model, natural language process-
ing, residual network
1. INTRODUCTION
Compared with Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs) are widely applied to
sequential data such as natural language processing [1] and
speech processing [2], while CNNs are more employed in
image processing fields [3–5]. Among the existing RNN
models, LSTM is one of the most widely approaches since
it initially solved gradient vanishing and exploding prob-
lems during RNN training [6] by introducing forget gate and
memory cell. Numerous RNNs variations [6–8] have been
proposed in previous literature to achieve the state-of-the-art
performance in different tasks, where LSTM is the corner-
stone of those structures. With the increase in the depth of
the layers, residual networks have proved their advantages in
both CNNs [9] and RNNs [10]. Residual networks provide
an alternative to LSTMs by connecting current and distant
time steps during training.
In this paper, we propose an Attention-aware Bidirec-
tional Multi-residual Recurrent Neural Network (ARMRNN)
and have shown improved performance in existing sequential
classification tasks. To summarize our contributions:
• We propose a algorithm which enables the updating of
the weights combining both previous and future time
steps.
• We leverage a multi-residual mechanism from exist-
ing residual network into the recurrent networks for se-
quence learning, through which we achieve the state-
of-the-art performance in classification tasks.
• We provide comprehensive analysis of the advantages
and disadvantages of current cutting-edge models in-
cluding RNNs and CNNs for sequence learning, espe-
cially in short-term text classification tasks.
2. RELATED WORK
Regarding improving the performance of classification tasks,
there are some directions towards networks exploration. First,
an increasing number of layers is employed for capturing fea-
tures. Second, various feature extraction methods such as
word2vec [11] and doc2vec [12] have been invented for bet-
ter words representations learning. Third, some variations to-
wards the interior structure units such as LSTM and GRU [7]
are proposed. With the development of neural networks, a
novel trend is to combine deeper networks and multiple neu-
ral network variations.
Since general CNNs or RNNs architectures do not fit well
in some tasks such as short-term text classification, the contri-
bution of this work lies in the fact that, it integrates advantages
of residual networks for the tasks of interest.
3. RESIDUAL LSTM PRELIMINARIES
LSTM solves gradient vanishing and exploding problems.
However, if the time sequence is too long, the dependency
between the former and latter information is neglected in
LSTM because current time step only depends on previous
time step. To enhance such a distant relationship, residual
network based on LSTM has been proposed [10, 13]. Figure
1 shows the general structure of a residual network. The basic
3582 978-1-5386-4658-8/18/$31.00 ©2019 IEEE ICASSP 2019