Frequent Itemset Mining in High Dimensional Data: A
Review
Fatimah Audah Md. Zaki and Nurul Fariza Zulkurnain
Department of Electrical and Computer Engineering,
International Islamic University Malaysia.
fatimah.audah@gmail.com, nurulfariza@iium.edu.my
Abstract. This paper provides a brief overview of the techniques used in fre-
quent itemset mining. It discusses the search strategies used; i.e. depth first vs.
breadth-first, and dataset representation; i.e. horizontal vs. vertical representa-
tion. In addition, it reviews many techniques used in several algorithms that
make frequent itemset mining more efficient. These algorithms are discussed
based on the proposed search strategies which include row-enumeration vs. col-
umn-enumeration, bottom-up vs. top-down traversal, and a number of new data
structures. Finally, the paper reviews on the latest algorithms of colossal fre-
quent itemset/pattern which currently is the most relevant to mining high-
dimensional dataset.
Keywords: Data mining, High-dimensional data.
1 Introduction
Data mining is the process to uncover hidden knowledge, identify trends and patterns,
and discovering new rules and relationships from large databases [1]. It is regarded as
the most important step in a process called knowledge discovery in databases (KDD),
which is closely related to another important process; data warehousing, where opera-
tional data from a few databases are first cleaned then integrated into a data ware-
house. Frequent pattern mining includes four main types including sequential pattern,
frequent itemset, and graph mining. Frequent itemset mining is an important types of
data mining that was initially developed for market basket analysis [2], which exam-
ines customer behaviour and identifies sets of items that often purchased together.
This information may be used to maximize organization profit by efficient products
arrangement on shelves, deciding related products to be on discounted price, etc.
which will encourage customers to spend more. The algorithm will find similar sets of
items exist at least a minimum amount of times, and will be represented in the form of
association rules. Association rules that satisfy the minimum support and minimum
confidence thresholds are considered interesting.
However, mining frequent itemsets from a large data set has a major drawback.
The huge number of generated itemsets that satisfies the minimum support will be too
much for computer memory to store and make further processing. Therefore, the con-
cepts of closed frequent itemset (CFI) and maximal frequent itemset (MFI) were in-
troduced. CFI is an itemset with support that satisfies the minimum support and does
© Springer Nature Singapore Pte Ltd. 2019
R. Alfred et al. (eds.), Computational Science and Technology, Lecture Notes
in Electrical Engineering 481, https://doi.org/10.1007/978-981-13-2622-6_32
325