In October 2014, two EAL Grade 9 students took the brave step and in their Applied Linguistics Project they chose to use a method that was completely new to them (and quite new to their teachers, too) – corpus analysis. As of March 2018, over 75% of topics chosen by this year’s EAL and Academic Writing students are corpus-based or corpus-assisted, using either existing corpora or corpora compiled by the student specifically for the project. Corpora have also become a standard tool to answer questions about language in the EAL and Academic Writing classes.
To learn more about the Applied Linguistics Project, see a previous post or contact us.
Why corpus-assisted studies?
Data vs. intuition
What is the most frequently used conditional in academic English? Is it more common to use “can” to refer to ability or to express modality? And is it grammatical to use the word “English” in the plural? Surprisingly, our gut feeling about these questions might not correspond to what data retrieved from authentic texts tell us. The power of corpora lies in their capacity to reveal how language is actually used, rather than squeezing language into the neat organization of a grammar book. The moment we are faced with the raw data churned out by the machine, discovering it contradicts what we believed about language, is the moment our journey to discovery truly starts.
“Language analysis is just guesswork.”
Interpreting language is often seen as something subjective, slippery, hard to capture. Corpora provide us with hard data and allow us to quantify our findings about language. Using electronic corpora also enables us to process large amounts of data, which reduces the risk of outliers swaying the findings. Last but not least, as the researcher is removed from the process of data retrieval, there is less opportunity for confirmation bias.
Beyond the classroom: ideas worth exploring
The tangible outcome of the Applied Linguistic Project is an academic paper and a presentation in front of an expert panel at Charles University. The intangible, but perhaps more significant, impact is the insight students gain into their own linguistic situation and learning, as well as the questions that arise from their findings. These trigger classroom discussions and spread outside the EAL classroom: students revisit ALP topics in their work in TOK and IB Language and Literature classes, or expand on them in their Extended Essay. Annually, selected ALP topics are developed into talks presented at the TEDxYouth conference hosted at ISP, reaching an even broader audience.
Beyond the classroom: corpora as administrative tool
Corpora can also prove useful as an assessment or placement tool. Lexical density, lexical richness or vocabulary idiosyncrasies can all help us evaluate the student’s language proficiency level. We can compare a given student’s work from two points in time, tracking his or her language development, compare in-class work to take-home assignments to check for authenticity, or compare the work against exemplars to assist with placement.
For an overview of the individual corpus analysis tools and their potential in assessment, please see the ISP corpus website.
Corpus linguistics is nothing new. Major publishers now offer textbooks and dictionaries based on corpora, studies abound using corpus tools to explore the rhetoric of mass media and public figures, and software is being developed to enable anyone learn and use corpus analysis. In our work with students so far, we have excellent experience with the following tools:
Available free online
- Czech National Corpus (Charles University)
- LancsBox (Lancaster University, Lancaster, UK)
- Wordandphrase.info (COCA-based, by Mark Davies)
- Sketch Engine (access to BAWE, COCA, parallel corpora, etc.)
- ISP corpus of high school academic texts: more information about the ISP Corpus of High-School Academic Texts, incl. examples of classroom exercises using the corpus
- AntConc (by Laurence Anthony, to student-created corpora): software allowing for the analysis of your own corpora
To receive basic training in corpus linguistics, we recommend a free online course offered by Lancaster University and Future Learn.