The Data-Driven Newsvendor with Censored Observations Anna-Lena Beutel, Stefan Minner Dept of Business Administration, University of Vienna, Brünner Straße 72, A-1210 Vienna, Austria anna-lena.beutel@univie.ac.at, stefan.minner@univie.ac.at Abstract Motivated by data from a large European retail chain, we tackle the newsvendor problem with censored demand observations by a distribution-free approach based on the data-driven newsvendor model. The model estimates the optimal inventory levels as a linear function of exogenous variables, e.g. price or temperature. To improve the forecast accuracy we simultaneously estimate unobservable lost sales, determine the coefficients of the exogenous variables which drive demand and calculate the optimal order quantity. Since demand exceeding supply is not recorded, we use the timing of (hourly) sales occurrences to establish (daily) sales patterns. These sales patterns permit to draw conclusions on the amount of unsatisfied demand and thus the true customer demand. To determine the coefficients of the inventory function, we formulate a linear programming model that balances inventory holding and penalty costs based on the censored demand observations. In a numerical study with data generated from the normal and the negative binomial distribution, we compare our model to other parametric and non-parametric estimation approaches. We evaluate the performance of the models in terms of inventory and service level for (non-)price-dependent demands and different censoring levels. We find that the data-driven newsvendor model copes especially well with highly censored data and price-dependent demand. In most settings with price-dependent demand, it achieves similar or higher service levels by holding lower inventories than other benchmark approaches from the literature. Keywords: Inventory, Unobservable Lost Sales, Newsvendor, Linear Programming. 1. Introduction In a lost sales inventory system, demand exceeding supply is usually not recorded. Studies have shown that out-of-stock (OOS) rates amount to 8.3 per cent of SKUs per category worldwide considering non-perishable products (Corsten and Gruen 2003). OOS situations are estimated to occur even more frequently for perishable products due to their short shelf- lives that make full product availability less desirable (ECR Europe 2003). This leaves the store manager without sufficient knowledge on additional sales that could have been made had the inventory level been higher. Ignoring excess demand and sticking to the same order- up-to level results consecutively in demand misspecification and more lost sales (Nahmias 1994). Empirical studies of customer reactions to stockouts indicate that customers finding poor availability in a store on a regular basis will not only incur short-term lost sales but will decide not to return to this store in the long-run (Anderson et al. 2006). Therefore, unobservable lost sales estimation is a key factor in inventory planning to determine optimal order quantities. Existing approaches can be categorized into parametric and non-parametric approaches. Parametric approaches assume some kind of underlying demand distribution which is often subject to the question whether this demand distribution is appropriate in practice. Non-parametric approaches do not require any distributional assumptions, but instead additional information which may not be as readily available and thus depend on the specific estimation problem. Therefore, retailers often apply rules of thumb to determine the optimal inventory level, underestimating forecast errors and levels which may affect the whole supply chain (Wagner 2002; Tiwari and Gavirneni 2007; Hosoda and Disney 2009). It is thus desirable to formulate a non-parametric estimation approach that relies on data that is readily available to retailers and takes external factors with a strong impact on demand into account. One such factor is price which is usually stored and linked to the sales quantity. Even