Lexical richness in research articles: Corpus-based comparative study among advanced Chinese learners of English, English native beginner students and experts Siyu Lei, Ruiying Yang * School of Foreign Studies, Xian Jiaotong University. No. 28 Xianning West Road, Xian, 710049, Shaanxi, PR China article info Article history: Received 15 February 2020 Received in revised form 28 June 2020 Accepted 29 June 2020 Keywords: Lexical richness Research articles Chinese PhD candidates Nativeness Expertise abstract The current study respectively compares lexical richness, i.e. lexical diversity, density and sophistication in research article manuscripts by Chinese PhD candidates (CPhD), un- published research papers by native nal-year undergraduates and master-level students (Native Beginner Students, NBS) and published research articles (RAs) by native experts (NE). It aims to sketch the prole of CPhDs use of vocabulary in terms of the three measures of lexical richness in comparison to the NBS and the NE. Our data consisted of 142 RA manuscripts by CPhD, 71 unpublished research papers by NBS, and 128 published RAs by NE in the eld of science and engineering. The results showed that CPhDs lexical richness levels are between NBS and NE. Besides, the three measures of CPhD are not balanced, namely, the lexical diversity is the lowest, similar to that of NBS, the lexical sophistication is in the middle and the lexical density is similar to that of NE. Comparison of the three groups of data indicates that academic expertise may play a more important role than nativeness in the writing of RAs. Integration of EAP instruction with discipline related research activities would be an important way to develop studentsability of vo- cabulary use in RA writing. © 2020 Elsevier Ltd. All rights reserved. 1. Introduction Vocabulary knowledge has been considered as a signicant indicator of the quality of L2 academic writing (Nation, 2013), and it is traditionally operationalized as lexical richness, including lexical diversity, lexical density, and lexical sophistication (Read, 2000). Lexical diversity generally refers to the ratio of different word types divided by the total number of tokens in a text or standardized length of samples, i.e. Type-Token Ratio. Lexical density refers to the ratio of content words, namely, nouns, adjectives, verbs, and adverbs, to the total number of words and lexical sophistication is the proportion of relatively unusual or advanced words in a text (Read, 2000). These three measures have been respectively proved to be positively related to writing prociency or quality such as lexical diversity by Gebril and Plakans (2016), lexical density by Gregori- Signes and Clavel-Arroitia (2015) and lexical sophistication by Zheng (2016) and Higginbotham and Reid (2019). According to a large-scale survey study among Hong Kong Chinese academics concerning their publication in international refereed journals in English (Flowerdew, 1999), the biggest difculty that Hong Kong Chinese academics encountered is the * Corresponding author. E-mail address: yangryd@xjtu.edu.cn (R. Yang). Contents lists available at ScienceDirect Journal of English for Academic Purposes journal homepage: www.elsevier.com/locate/jeap https://doi.org/10.1016/j.jeap.2020.100894 1475-1585/© 2020 Elsevier Ltd. All rights reserved. Journal of English for Academic Purposes 47 (2020) 100894