Special Section: Gender Typicality and Development Contextual variance and invariance in self-perceived gender typicality and pressure to conform to gender role expectations Melisa Castellanos, 1 Lina Saldarriaga, 2 Luz Stella Lopez, 3 and William M. Bukowski 1 Abstract Evidence of cultural comparisons of gender-identity-measurement scales is scarce. The present study aims to assess the scalar invariance of two dimensions of a widely used gender identity scale (Egan and Perry’s Multidimensional Gender Identity Inventory) across two cultural contexts. Fourth, sixth, and fifth graders from Barranquilla (Colombia) and Montre ´al (Canada) (n ¼ 351) completed an abbreviated, self- report revised version of Egan and Perry’s scale. A Confirmatory Factor Analysis demonstrated that typicality and pressure to conform to traditional gender roles are distinct factors and tend to be stable over time. Furthermore, a multi-group comparison analysis showed that the measurement model did not vary significantly as a function of cultural context. Our study adds evidence to support the use of a reliable and valid measurement instrument that is invariant across cultural settings, to allow comparisons that do not depend on contextual variations in the assessment of gender identity during childhood. Keywords Gender identity, Gender typicality, Gender pressure, cross-culture, elementary school children, variability Introduction Research literatures in developmental psychology thrive on several intersecting but distinct conditions. Two critical conditions that promote progress in our understanding of development-related experiences and outcomes are (a) the availability of strong mea- sures and (b) the capacity to study phenomena across contexts— especially cultural contexts. These fundamental conditions intersect in the sense that the second condition cannot occur in the absence of the first. In this paper we assess the properties of two subscales of a well- known measure of gender identity, specifically Egan and Perry’s Multidimensional Gender Identity Inventory (2001). The two sub- scales that we assess are the measure of self-perceived gender typi- cality and the measure of self-perceived pressure to conform to gender role expectations. The goal of our analysis is to examine whether the internal structure and means of these scales vary across two cultural contexts and to examine if the longitudinal associations between these measures vary as a function of context. Our assessment of these measures comes from two perspectives. Each perspective combines a concern with both of the features we identify above (i.e., availability and continuity across contexts). The first perspective is largely psychometric. It is concerned with the contextual invariance of the factor structure of the latent mea- sures of typicality and pressure. Establishing invariance across con- texts is important because it demonstrates the structural equivalence of a measure across the contexts (Asparouhov & Muthe ´n, 2014). Without evidence of invariance across two con- texts, one cannot tell whether the same construct has been measured in each place (Marsh et al., 2017). The second perspective is predicated on the first. It considers whether the stability of the latent measures of the two constructs is the same in two cultural contexts. From this point of view one can assess whether structurally equivalent measures will function in the same manner in two contexts. This perspective can provide basic information about the relative differences in the functioning of these measures in different places. The psychometric properties of the measures of typicality and pressure were presented in the initial description of the scale (Egan & Perry, 2001). There are no reasons to think that the levels of reliability reported in this original paper are not a basic feature of these measures that have been replicated across multiple studies (Aoyagi, Santos, & Updegraft, 2017; Corby, Hodges, & Perry, 2007; Smith & Leaper, 2006). Our assessment differs from that of the original paper in two critical ways. These two differences derive from our own interest in using these measures as latent rather than as observed measures. The first difference is that we used a smaller set of items than those available in the initial measures 1 Department of Psychology and Centre de Recherche en De ´veloppement Humain, Concordia University, Canada 2 Corporacio ´ n Colombiana de Padres y Madres, Red PaPaz, Colombia 3 Universidad del Norte, Colombia Corresponding author: William M. Bukowski, Department of Psychology and Centre de Recherche en De ´ veloppement Humain, Concordia University, 7141 Sherbrooke Street West, Montre ´al, Que ´bec, H4B 1R6, Canada. Email: william.bukowski@concordia.ca International Journal of Behavioral Development 1–4 ª The Author(s) 2019 Article reuse guidelines: sagepub.com/journals-permissions DOI: 10.1177/0165025419844037 journals.sagepub.com/home/jbd