Efficient Parameter Estimation for Semi-Continuous Data: An Application to Independent Component Analysis Sai K. Popuri 1 , Zois Boukouvalas 2 1 Walmart Labs, San Bruno, CA 94066 2 American University, Department of Mathematics and Statistics, Washington, DC 20016 Introduction Semi-continuous data have a point mass at zero and are continuous with positive support. Such data arise naturally in several real-life situations like daily rainfall at a location, sales of durable goods among many others. Therefore, efficient estimation of the underlying probability density function (PDF) is of significant interest. Contribution ◮ We present an estimation method for semi-continuous data based on the maximum entropy principle. ◮ We demonstrate its successful application in developing a new Independent Component Analysis (ICA) algorithm, ICA-Semi-continuous Entropy Maximization (ICA-SCEM). ◮ We present a theoretical analysis of the proposed estimation technique and using simulated data we demonstrate the superior performance of ICA-SCEM over classical ICA algorithms. Estimation using entropy maximization The PDF of a semi-continuous random variable Y can be written as 1 p (y | γ,θ )= γδ (y )+(1 − γ )δ ∗ (y )g (y | θ ), where g (y | θ ) is a PDF of a continuous random variable with support on (0, ∞), γ is the point mass at zero, δ (y ) is the indicator function, and δ ∗ (y )= 1 − δ (y ). Maximum Entropy Principle: max p (y ) H (p (y )) = − I p (y ) log p (y ) μ(dy ) s.t. I h i (y )p (y ) μ(dy )= α i , for i = 1,..., K , where h i (y ) are measuring functions, α i = ∑ T t =1 h i (t )/T are the sample averages, and K denotes the total number of measuring functions. Using the maximum entropy principle estimate the distribution that maximizes the entropy of Y . For known γ , the distribution that maximizes the entropy of Y is given by 1 p (y )= γδ (y )+(1 − γ )δ ∗ (y )g ∗ (y ), where g ∗ maximizes the entropy of a continuous random variable with support (0, ∞) subject to the constraints ∞ 0 h i (z )g (z )dz = α i 1−γ , i = 1,..., K . 1 S. K. Popuri, “Prediction Methods for Semi-continuous Data with Applications in Climate Science,”Ph.D. thesis, University of Maryland, Baltimore County, 2017. Example Suppose we have a sample of size n of semi-continuous data {y 1 ,..., y n }, γ is set to the proportion of zeroes in the data, and α 2 = 1 n i :y i >0 y i and α 3 = 1 n i :y i >0 log(y i ). The resulting MaxEnt distribution is given by f (y | γ,κ,θ )= γδ (y )+(1 − γ )δ ∗ (y ) y κ−1 e −y /θ θ κ Γ(κ) , where θ and κ are solutions to α 2 = κθ and α 3 = ψ (κ) + log(θ ), where ψ (.) is the digamma function. Application to ICA Generative model: x = As, where x are the observations and s are the latent sources linearly mixed by matrix A. = N T 1 1 X N ×T A N ×N S N ×T s i ICA can separate mixed sources subject to scaling and permutation ambiguities by assuming source independence . s A x W y In order to estimate W, we minimize the mutual information (MI) among the source estimates 2 y 1 ,..., y N J ICA (W)= N n=1 H (y n ) − log | det(W)|− H (x), where H (y n )= −E {log(p (y n ))}. Each y n is assumed to be semi-continuous. Development of ICA-SCEM algorithm Decoupling the MI cost function enables for the development of effective algorithms 2 . This is achieved by expressing the volume of the parallelepiped, | det(W)|, as the product of the area of its base and its height 3 . The cost function with respect to each w n is given by J ICA (W)= N n=1 H (y n ) − log |(h ⊤ n w n )| (1) − log | det(W n W ⊤ n )|− H (x). The gradient of (1) can be written in the decoupled form ∂ J (W) ∂ w n = −E {φ(y n )x}− h n h ⊤ n w n , (2) where φ(y n )= ∂ log p (y n ) ∂ y n . As can be seen in (2), each gradient direction depends directly on the corresponding estimated source PDF and ∂ log p (y n,t ) ∂ y n,t = 0, if y n,t = 0 ∂ log g (y n,t |θ n,t ) ∂ y n,t , if y n,t > 0. and φ(y n )=[ ∂ log p (y n,1 ) ∂ y n,1 ,..., ∂ log p (y n,T ) ∂ y n,T ] ⊤ is a vector of partial derivatives of dimension T . Experimental results Simulation 1: Data for each source is generated using the two-part gamma distribution f (y | γ,κ,θ )= γδ (y )+(1 − γ )δ ∗ (y ) y κ−1 e −y /θ θ κ Γ(κ) , where γ = 0.6, θ = 1, and κ = 1 Simulation 2: Data for the first two out of five sources are generated using the two-part gamma model and for the rest of the three sources are generated using the following two-part lognormal distribution f (y | γ,μ,σ )= γδ (y )+(1 − γ )δ ∗ (y ) 1 y φ log(y ) − μ σ , where data for the five sources are generated according the following parameter choices Source γ κ θ μ σ 1 0.6 1 1 − − 2 0.4 1 2 − − 3 0.6 − − 0 1 4 0.5 − − 0.5 0.5 5 0.4 − − 1 2 ICA-SCEM performs the best among well known ICA algorithms in terms of separation performance Conclusion and future directions An efficient density estimation method for semi-continuous data was presented and a new ICA algorithm for semi-continuous data, ICA-SCEM, is proposed. Future Directions: ◮ Comparisons of ICA-SCEM, with ICA algorithms that exploit the sparsity of the data as well as non-negative source separation based methods. ◮ Multivariate extensions could be developed by considering multivariate distributions for the continuous part with element-wise Bernoulli probabilities determining the presence of zeros. 2 T. Adalı, M. Anderson, and G.-S. Fu, “Diversity in independent component and vector analyses: Identifiability, algorithms, and applications in medical imaging,” IEEE Signal Processing Magazine, vol. 31, no. 3, pp. 18-33, May 2014. 3 Z. Boukouvalas, Y. Levin-Schwartz, R. Mowakeaa, G.-S. Fu, and T. Adalı, “Independent Component Analysis Using Semi-Parametric Density Estimation Via Entropy Maximization,” In 2018 IEEE Statistical Signal Processing Workshop (SSP), pp. 403-407, 2018. MLSP 2019,Pittsburgh, PA, USA E-mail: boukouva@american.edu https://zoisboukouvalas.github.io/