MODE EFFECTS AND CONSUMER ASSESSMENTS OF HEALTH PLANS Floyd Jackson Fowler, Jr., Patricia M. Gallagher, University of Massachusetts-Boston Floyd J. Fowler, Jr., Center for Survey Research, 100 Morrissey Blvd., Boston, MA 02125-3393 Key Words: Mode effects, health plan surveys Background In the fall of 1995, the Agency for Health Care Policy and Research let three cooperative agreements with the Research Triangle Institute, RAND, and the Harvard Medical School, to work together to develop an instrument to measure consumer assessments of their health care plans. The goal was to have an instrument that would work across various kinds of plans, more and less managed, to provide a basis for comparing consumer experiences. Among the more challenging standards for the instrument were to produce comparable data by mail and by telephone, to be usable in Spanish or in English, and, most of all, to provide data that would be helpful to consumers in making choices among plans. During the past year and a half, the three organizations, and their sub-contractors, have been working together to develop this instrument. The first public version was released this April. During the past year and a half, candidate questions and survey instruments have been subjected to extensive cognitive testing and field testing, using different modes, with different populations, and with different kinds of health care plans. This paper addresses one particularly pervasive substantive challenge for those developing such an instrument, the way that challenge interacts with effort to design comparable instruments for mail and telephone administration, and the results to date of our tests of efforts to solve these problems. The Inapplicable Problem When we first started testing questions, it immediately became apparent that a major challenge was that some questions do not apply to all respondents. The most obvious, and possibly simplest, problem is that asking people to rate medical care within a specific reference period (for instance, we chose six months) does not work for people who have not received any medical care during that reference period. However, the problems are much more pervasive than that, and sometimes much more difficult. For example, if we want to ask people about whether or not they participate in medical decision making, we have to identify people who have actually had a medical decision to make. If we want to ask about emergency medical care, we have to identify people who have experienced an emergency. If we want to f'md out about whether health plans approve needed tests and treatments or seeing specialists, we have to identify people who think they have needed tests and treatments or tried to get specialist care. There are basically four ways that researchers who have tried to assess health care experiences have dear with the problem of potentially inapplicable questions (Figures 1). They have ignored it, and had everyone answer all the questions. . They have offered a "does not apply" option, without exactly specifying what the criteria for applicability were. . They offered an inapplicable alternative to questions, which explicitly describes what inapplicable means. . Prior to the focus question, they have asked respondents explicitly whether or not they have had the kind of experience that the follow-up question is designed to measure. Issues Related to Mode of Data Collection Being able to collect data both by mail and by phone is very important to having a universally useful instrument. Depending on the available information about sampled individuals, and the characteristics of samples, one approach or the other may be best in order to carry out a survey with an adequate response rate. Indeed, the potential for using combinations of modes, in order to maximize the rate of response, is a particularly desirable feature. In order to have the option of collecting data by either mode, however, it is important that the results be comparable. There is extensive literature comparing data collected by various modes. When Hochstim (1967) did one of the earliest such studies, he did over 1000 comparisons of between-mode results and found only 51 differences in aggregate answers. Many subsequent researchers have found that comparable data emerge from different modes of data collection. When 928