Deep Matrix Factorization with Spectral Geometric Regularization Amit Boyarski *1 Sanketh Vedula *1 Alex Bronstein 1 Abstract Deep Matrix Factorization (DMF) is an emerg- ing approach to the problem of reconstructing a matrix from a subset of its entries. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. Despite these promising theoretical results, empirical evalua- tion of vanilla DMF on real benchmarks exhibits poor reconstructions which we attribute to the ex- tremely low number of samples available. We propose an explicit spectral regularization scheme that is able to make DMF models competitive on real benchmarks, while still maintaining the im- plicit regularization induced by gradient descent, thus enjoying the best of both worlds. 1. Introduction Matrix completion deals with the recovery of missing values of a matrix from a subset of its entries, Find X s.t. X ⊙ S = M ⊙ S. (1) Here X stands for the unknown matrix, M ∈ R m×n for the ground truth matrix, S is a binary mask representing the input support, and ⊙ denotes the Hadamard product. Since problem (1) is ill-posed, it is common to assume that M belongs to some low dimensional subspace. Under this assumption, the matrix completion problem can be cast via the least-squares variant, min X rank (X)+ µ 2 ‖(X − M) ⊙ S‖ 2 F . (2) Relaxing the intractable rank penalty to its convex enve- lope, namely the nuclear norm, leads to a convex problem whose solution coincides with (2) under some technical con- ditions (Candès & Recht, 2009). Another way to enforce low rank is by explicitly parametrizing X in factorized * Equal contribution 1 Department of Computer Science, Tech- nion, Israel Institute of Technology. Correspondence to: Amit Boyarski <amitboy@cs.technion.ac.il>. Preliminary version. form, X = X 1 X 2 . The rank is upper-bounded by the minimal dimension of X 1 , X 2 . Further developing this idea, X can be parametrized as a product of several ma- trices X = N i=1 X i , a model we denote as deep matrix factorization (DMF). Gunasekar et al. (2017); Arora et al. (2019) investigated the minimization of overparametrized DMF models using gradient descent, and came to the follow- ing conclusion (which we will formally state in section 2): whereas in some restrictive settings minimizing DMF using gradient descent is equivalent to nuclear norm minimization (i.e., convex relaxation of (2)), in general these two mod- els produce different results, with the former enforcing a stronger regularization on the rank of X. This regulariza- tion gets stronger as N (the depth) increases. In light of these results, we shall henceforth refer by "DMF" to the aforementioned model coupled with the specific algorithm used for its minimization, namely, gradient descent. Oftentimes, additional information is available in the form of a graph that neatly encodes structural (geometric) in- formation about X. For example, we can constrain X to belong to a subspace of the eigenvectors of some graph Laplacian, i.e., to be band-limited on the graph. Such infor- mation is generally overlooked by purely algebraic entities (e.g., rank), and becomes invaluable in the data poor regime, where the theorems governing reconstruction guarantees (i.e., (Candès & Recht, 2009)) do not hold. Our work lever- ages the recent advances in DMF theory to marry the two concepts: a framework for matrix completion that is explic- itly motivated by geometric considerations, while implicitly promoting low-rank via its DMF structure. Contributions. Our contributions are as follows: • We propose task-specific DMF models that follow from geometric considerations, and study their dynamics. • We show that with our proposed models it is possible to obtain state-of-the-art results on various recommen- dation systems datasets, making it one of the first suc- cessful applications of deep linear networks on real problems. • Our findings challenge the quality of the side infor- mation available in various recommendation systems datasets, and the ability of contemporary methods to utilize it in a meaningful and efficient way. arXiv:1911.07255v2 [cs.LG] 23 Feb 2020