LONG PAPER Validating the effectiveness of EvalAccess when deploying WCAG 2.0 tests Amaia Aizpurua • Myriam Arrue • Markel Vigo • Julio Abascal Published online: 3 April 2011 Ó Springer-Verlag 2011 Abstract While automatic tools are not intended to replace human judgment, they are crucial in order to develop accessible websites. The release of WCAG 2.0 has caused great expectation, as it is supposed to be precisely testable with automated review tools. Therefore, more effective tools could be developed. However, so far few tools applying WCAG 2.0 have been developed. This paper presents an evaluation framework which has been updated in order to evaluate the new tests. In addition, it describes a validation process carried out in order to verify the effec- tiveness of the new version of the evaluation tool. The effectiveness is validated by conducting a quantitative and qualitative analysis of the results obtained by applying both versions of the tool (the one implementing WCAG 1.0 and the one implementing WCAG 2.0) to a set of selected web pages, as well as by manual evaluation of an expert for detecting the possible false positives and false negatives produced by each tool. Keywords Web accessibility guidelines Á Tool effectiveness Á Automatic evaluation 1 Introduction The development of WCAG 2.0 guidelines [8] has caused great expectation, as it is supposed to overcome some of the limitations of the previously developed sets of guide- lines. It is highly probable that this relatively new set of guidelines will assume the role played by its predecessor, WCAG 1.0 [9]. Therefore, it is expected that this new version will be soon considered as a de facto accessibility standard. The latest version of WCAG signiﬁcantly extends the previous one, as it covers more advanced and emerging web technologies (SMIL, Client-side Scripting, Server-side Scripting, ARIA). In addition, it includes navigability as an essential element of web accessibility, highlights the importance of semantic structure of content, deals with errors arising from incorrect input, etc. The W3C recommendation document itself is more structured, and it is easier to understand with respect to the previous version, as more material have been added, such as examples and steps of the evaluation process. Hence, accessibility evaluations based on WCAG 2.0 are supposed to be less ambiguous, as well as more consistent and accurate. However, much debate has taken place on WCAG 2.0 guidelines release: before they became a W3C recommendation, some claimed that WCAG 2.0 docu- ments were unmanageable due to their lack of readability while also arguing difﬁculties in guideline applicability [10]. Later, empirically backed experiments have brought to light that, to some extent, WCAG 2.0 are not more valid nor reliable than WCAG 1.0 [7]. From an automated review tools point of view, WCAG 2.0 strives to be more precisely testable, since most ambiguous statements have been removed. The guidelines are supposed to be less subjective, and most of them A. Aizpurua Á M. Arrue Á J. Abascal (&) Informatika Fakultatea, University of the Basque Country, Manuel Lardizabal 1, 20018 Donostia, Spain e-mail: julio.abascal@ehu.es A. Aizpurua e-mail: amaia.aizpurua@ehu.es M. Arrue e-mail: myriam.arrue@ehu.es M. Vigo Computer Science School, University of Manchester, Kilburn Building, Oxford Road, Manchester M13 9PL, UK e-mail: markel.vigo@manchester.ac.uk 123 Univ Access Inf Soc (2011) 10:425–441 DOI 10.1007/s10209-011-0226-z