Comparing ML Inference on RDBMS against Industry-leading Deep Learning Serving Systems CSE598 : Group 22: Term Project: Final Report Amrit Bhaskar Sumair Bashir Venkatesh Gunda I. PROBLEM STATEMENT Model Inference Serving is a vital stage and is responsible for signiﬁcant operations costs in the lifecycle of a Machine Learning Application. Therefore, any optimizations to this process are highly beneﬁcial. One important reason current DL Serving Systems (Ten- sorFlow Serving, Raﬁki etc.) suffer from high operational costs is they’re stand-alone systems and are decoupled from data management systems. This physical decoupling of data and model serving adds management complexity and data transfer latency to transfer input features from databases to deep learning frameworks. One other problem with the current DL framework is the memory constraint. Current DL Serving frameworks are compute-focused and require models, input features and in- termediate feature maps all ﬁt in memory. Failing to do so results in a system failure. Large models with a huge number of features are common in NLP and extreme multi-label classiﬁcation impacts the availability of the model serving system. Therefore, we aim to study a system called NetsDB that tackles these concerns and do a comparative analysis with existing DL serving frameworks with respect to the time and memory optimizations along with model performance compromises observed along the process, if any. II. LITERATURE REVIEW Existing works [6] propose a framework for serving Deep Learning Models from Relational Databases which our work builds upon. It implements a system equipped with several synergistic storage optimization techniques, covering indexing, page packing, and caching and evaluates such techniques on by the serving of multiple word embedding models, multiple text classiﬁcation models and multiple classiﬁcation models. Our work draws parallels by means of serving a binary classiﬁcation logistic regression model. // [2] expands on tensor manipulation by building on the fundamental concept of abstracting the tensor as a set of tensor blocks and encoding the local linear algebra computation that manipulate single or a pair of tensor blocks and nests linear algebra computations together relational algebra operators. Our Logistic Regression model W T X + b, on tensor relational algebra framework, translates to a join followed by aggregation for matrix multiplication and a join operation for a matrix addition operation as paralleled by Figure 1 below. [6] employs a magnitude aware deduplicate detection. Deduplication allows certain pages to be shared by multiple tensors instead of being physically stored in an array of pages of equivalent size. Magnitude aware deduplicate detection processes blocks of smaller magnitude ﬁrst and the model accuracy is periodically validated after deduplicating every k number of blocks. It further eliminates overheads by leverag- ing Locality Sensitive Hashing to detect similar tensor blocks. Fig. 1. Mapping linear algebra to relational algebra III. DATASET AND TOOLS USED TPC Express Benchmark AI (TPCx-AI) is a set of bench- marks that can be used to assess a wide range of system topologies and implementation methodologies in a rigorous, comparable, unbiased manner. TPCx-AI models the end to end AI and machine learning data intensive system and imitates the activities of retail businesses and datacenters. The established schema contains business information, such as customer, order, ﬁnancial and product data. The use case for our Dataset was derived from an e- commerce business challenge of ﬁnancial transactions in the retail industry, expressed in the form of (transactionID, amount, IBAN, senderID, receiverID, timestamp) tuples. We dropped the IBAN(International Bank Account Number) as it was a constant length of 18, consisting of alphanumeric values and it doesn’t seem to add much value to the semantic