CHIMERA. Romance Corpora and Linguistic Studies 5.1 (2018), 131-140. ISSN 2386-2629
© 2018 The Author; distributed under the Creative Commons Attribution License
“HARTA” de noveles
Un corpus de español académico
Milka Villayandre Llamazares
Universidad de León
The aim of this review is to account for the process of compilation and codification of the
corpus HARTA-Noveles. This corpus was created as part of the research project titled
“Corpus-based study of lexical combinations of academic Spanish for the development of
a computational tool for academic writing assistance” (HARTA)
1
, under the direction of
Margarita Alonso Ramos (University of La Coruña). The corpus consists of representative
samples of essays produced by Spanish university students and gathered with the purpose
of studying academic lexical combinations (CLA)
2
, i.e., recurrent segments specific to the
academic domain, along with collocations, discourse markers and other multiword expres-
sions. Inspired by the BAWE corpus (British Academic Written English), our corpus is
formed exclusively by final project texts (for the degrees) and dissertations (for the mas-
ters) selected from different public repositories of Spanish universities and from various
scientific domains. These texts have been annotated with an specific system adapted from
that followed by the Spanish Royal Academy (RAE) in CORPES XXI
3
.
Keywords: corpus linguistics, learner corpus, academic writing, lexical combinations,
writing assistant
1. Introducción
Si bien los corpus de aprendiz son un campo joven dentro de la lingüística de
corpus, en el caso del español hay que decir que estamos asistiendo aún a sus
primeros pasos. M. Alonso-Ramos (2016) recoge en un volumen monográfico las
principales investigaciones y proyectos que se han llevado o se están llevando a
1
Acrónimo del nombre en español del proyecto: “Estudio de las combinaciones léxicas del
español académico basado en corpus para una Herramienta de Ayuda a la Redacción de Textos
Académicos” (HARTA).
2
En español, combinaciones léxicas académicas (CLA).
3
Corpus del Español del Siglo XXI (CORPES).