1 2.2 Ensemble Based Probabilistic Forecast Verification Yuejian Zhu and Zoltan Toth Environmental Modeling Center, NCEP, NWS/NOAA Washington DC 20233 E-Mail: Yuejian.Zhu@noaa.gov 1. INTRODUCTION: The NCEP ensemble verification system was developed to evaluate ensemble based probabilistic forecast in the 90s (Zhu et al., 1996). This system mainly focuses on two attributes: the reliability and resolution (Toth et al., 2003, 2006) of the NCEP ensemble based probabilistic forecast, in addition to the traditional verification measures such as Pattern Anomaly Correlation (PAC) and Root Mean Square (RMS) error for the ensemble mean, rank histogram, and outliers (Zhu, 2004; Toth et al., 2003), and Perturbation versus Error Correlation Analysis (PECA) (Wei and Toth, 2003), etc. For precipitation verification, Equitable Threat Score (ETS), True Skill Statistics (TSS) and Bias (BI) have been used to measure the ensemble mean (Zhu, 2007). In this ensemble based probabilistic verification system, the definitions of events are based on 1) user defined thresholds, 2) climatological percentiles, and 3) the ensemble members. In practice at NCEP, the climatological percentiles (10 climatologically- equally-likely bins) have been used for NCEP/GEFS (Global Ensemble Forecast System) daily verification. Therefore, the probabilistic skill scores for current NCEP/GEFS forecasts are based on the NCEP/NCAR 40-year reanalysis climatology (references). On a routine basis, this system generates a Brier Score (BS), Brier Skill Score (BSS) with its decomposition of reliability and resolution, Ranked Probability Skill Score (RPSS), Continuous Ranked Probability Skill Score (CRPSS), Relative Operational Characteristics (ROC) area score, Relative Economic Value (REV) score for selected loss/cost ratios to apply to upper atmospheric variables such as 500hPa geopotential height, and 850hPa temperature and near surface variables such as 1000hPa geopotential height, 2-meter temperature, and 10-meter wind (u and v). In terms of the ensemble mean, as in a deterministic forecast, ensemble spread and RMS error have been introduced, histogram (or Talagrand) distributions and outliers have been generated to measure the ensemble’s reliability and consistency. This system was recently upgraded and applied to the Northern American Ensemble Forecast System (NAEFS), which combines the NCEP and CMC ensemble forecasts. This article mainly summarizes this verification system. 2. METHODOLOGY OF VERIFICATION: a. RMS error and SPRD (ensemble spread): RMS errors of the ensemble mean measure the distance between forecasts and analyses (or observations). SPRD (ensemble spread) is calculated by measuring the deviation of ensemble forecasts from their mean (Zhu, 2005). Figure 1 is an example of a display of RMS errors and ensemble spread (SPRD) for a 15-day lead-time forecast. Usually, SPRD is defined as: ∑ = - - = N n n f f N SPRD 1 2 )) ( ( 1 1 Where ∑ = = N n n f N f 1 ) ( 1 is for the ensemble mean and f is for the ensemble forecast. In general, an ideal ensemble forecast will be expected to have the same size of ensemble spread as their RMS error at the same lead time in order to represent full forecast uncertainty (Zhu, 2005, Buzza et al., 2005). But most of the ensemble systems are underdispersed (less spread) for longer lead times due to an imperfect model system (or physical parameterizations) and other things. Therefore, a stochastic process will be introduced to increase ensemble spread for longer lead-time forecasts (Hou et al., 2008). On the other hand, the ensemble mean consistently performs better than the high resolution deterministic forecast GFS (T382L64) after a 2-day lead time, while the high resolution