Bayesian Analysis (2011) 6, Number 1, pp. 25–30 Comment on Article by Polson and Scott Bani K. Mallick * , Sounak Chakraborty † and Malay Ghosh ‡ We congratulate the authors for a very interesting article. The key contribution of this paper, as we see it and as suggested in the title, is the introduction of latent parame- ters to carry out Bayesian analysis with support vector machines. The basic identities (4) and (6) are particularly useful in this regard, which enable one to overcome much of the complexities of a non smooth loss resulting in a non smooth likelihood. As an anecdote, from a Bayesian angle, it is extremely convenient to view the loss as the neg- ative of the loglikelihood (for example associating squared error loss with the normal likelihood) and the penalty part with the prior. This is the approach taken in this paper, and also earlier in Mallick et al. (2005). Based on a loss function, we can obtain the normalized or the non-normalized (or pseudo) likelihood. The authors considered the non-normalized likelihood and that way obtained the pseudo posterior distribution. This pseudo posterior distribution may not be suitable to make probabilistic inference. Mallick et al. (2005) considered both the normalized and the non-normalized likelihoods and the classification performances were compatible. It will be interesting to see how the proposed method can be adapted for the model with the normalized likelihood. The introduction of latent parameters facilitates Bayesian variable selection with LASSO (Park and Casella (2008); Bae and Mallick (2007)) and its generalizations such as grouped LASSO, fused LASSO and elastic net (Kyung et al., 2010; Chakraborty and Guo, 2010). Not surprisingly, this helps also in classification problems with a penalty function which is the same as in LASSO. One interesting feature in this paper is the consideration of a general α in (6) rather than the conventional α = 1 or 2. Corollary 4 in this paper seems to be an interesting result, especially because of its importance in developing the necessary algorithm. The major emphasis of this paper seems to be on posterior inference. In many real prob- lems, the emphasis should be on prediction rather than estimation. Particularly, the predictive distribution is useful to compare different classification models. The latent development of this paper should be exploited also in that framework. Specifically, for classification, this will amount to estimating probabilities of misclassification of future observations. The authors have considered only the linear SVM model. The nonlinear SVM model will require more complex analysis due to the presence of parameters in the X (design) * Department of Statistics, Texas A&M University, College Station, TX, mailto:bmallick@stat. tamu.edu † Department of Statistics, University of Missouri, Columbia, MO, mailto:chakrabortys@missouri. edu ‡ Department of Statistics, University of Florida, Gainesville, FL, mailto:ghoshm@stat.ufl.edu 2011 International Society for Bayesian Analysis DOI:10.1214/11-BA601A