1 Latent Factors of Language Disturbance and Relationships to Quantitative Speech Features Sunny X. Tang, M.D., 1* Katrin Hänsel, Ph.D., 2 Yan Cong, Ph.D., 1 Amir H. Nikzad, M.D., 1 Aarush Mehta, 1 Sunghye Cho, Ph.D., 3 Sarah Berretta, B.A., 1 Leily Behbehani, B.S., 1 Sameer Pradhan, Ph.D., 3 Majnu John, Ph.D., 1 Mark Y. Liberman, Ph.D. 3 1. Feinstein Institutes for Medical Research, Institute of Behavioral Science 2. Yale University, Department of Laboratory Medicine 3. University of Pennsylvania, Linguistic Data Consortium * Corresponding Author: stang3@northwell.edu Abstract Background and Hypothesis Quantitative acoustic and textual measures derived from speech (“speech features”) may provide valuable biomarkers for psychiatric disorders, particularly schizophrenia spectrum disorders (SSD). We sought to identify cross-diagnostic latent factors for speech disturbance with relevance for SSD and computational modeling. Study Design Clinical ratings for speech disturbance were generated across 14 items for a cross-diagnostic sample (N=343), including SSD (n=97). Speech features were quantified using an automated pipeline for brief recorded samples of free-speech. Factor models for the clinical ratings were generated using exploratory factor analysis, then tested with confirmatory factor analysis in the cross-diagnostic and SSD groups. Relationships among factor scores, speech features and other clinical characteristics were examined using network analysis. Study Results We found a 3-factor model with good fit in the cross-diagnostic group and acceptable fit for the SSD subsample. The model identifies an impaired expressivity factor and two interrelated disorganized factors for inefficient and incoherent speech. Incoherent speech was specific to psychosis groups, while inefficient speech and impaired expressivity showed intermediate effects in people with nonpsychotic disorders. Network analysis showed that the factors had distinct relationships with speech features, and that the patterns were different in the cross-diagnostic versus SSD groups. Conclusions We report a cross-diagnostic 3-factor model for speech disturbance which is supported by good statistical measures, intuitive, applicable to SSD, and relatable to linguistic theories. It provides a valuable framework for understanding speech disturbance and appropriate targets for modeling with quantitative speech features. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 1, 2022. ; https://doi.org/10.1101/2022.03.31.22273263 doi: medRxiv preprint NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.