Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios Andrii Babii ∗ Ryan T. Ball † Eric Ghysels ‡ Jonas Striaukas § August 11, 2020 Abstract This paper introduces structured machine learning regressions for prediction and now- casting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different fre- quencies, we focus on the sparse-group LASSO regularization. This type of regulariza- tion can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed τ -mixing processes which may be of independent interest in other high-dimensional panel data settings. Keywords: corporate earnings, nowcasting, high-dimensional panels, mixed frequency data, text data, sparse-group LASSO, heavy-tailed τ -mixing processes, Fuk-Nagaev inequality. * University of North Carolina at Chapel Hill - Gardner Hall, CB 3305 Chapel Hill, NC 27599- 3305. Email: babii.andrii@gmail.com † Stephen M. Ross School of Business, University of Michigan, 701 Tappan Street, Ann Arbor, MI 48109. Email: rtball@umich.edu ‡ Department of Economics and Kenan-Flagler Business School, University of North Carolina– Chapel Hill. Email: eghysels@unc.edu. § LIDAM UC Louvain and FRS-FNRS Research Fellow. Email: jonas.striaukas@gmail.com. arXiv:2008.03600v1 [econ.EM] 8 Aug 2020