The power of corpora: how and why

In October 2014, two EAL Grade 9 students took the brave step and in their Applied Linguistics Project they chose to use a method that was completely new to them (and quite new to their teachers, too) – corpus analysis. As of March 2018, over 75% of topics chosen by this year’s EAL and Academic Writing students are corpus-based or corpus-assisted, using either existing corpora or corpora compiled by the student specifically for the project. Corpora have also become a standard tool to answer questions about language in the EAL and Academic Writing classes.

To learn more about the Applied Linguistics Project, see a previous post or contact us.

The first corpus-assisted project at ISP: a Grade 9 student uses parallel corpora (InterCorp) to identify the various translations of the word “dobry” (good) depending on the context.

Why corpus-assisted studies?
Data vs. intuition
What is the most frequently used conditional in academic English? Is it more common to use “can” to refer to ability or to express modality? And is it grammatical to use the word “English” in the plural? Surprisingly, our gut feeling about these questions might not correspond to what data retrieved from authentic texts tell us. The power of corpora lies in their capacity to reveal how language is actually used, rather than squeezing language into the neat organization of a grammar book. The moment we are faced with the raw data churned out by the machine, discovering it contradicts what we believed about language, is the moment our journey to discovery truly starts.

A Grade 10 student discovers, contrary to her hypothesis, which types of the conditional are most commonly found in academic English, and uses her findings to suggest conditionals should be taught according to this order, rather than in the sequential order usually found in grammar books.

“Language analysis is just guesswork.”
Interpreting language is often seen as something subjective, slippery, hard to capture. Corpora provide us with hard data and allow us to quantify our findings about language. Using electronic corpora also enables us to process large amounts of data, which reduces the risk of outliers swaying the findings. Last but not least, as the researcher is removed from the process of data retrieval, there is less opportunity for confirmation bias.

Beyond the classroom: ideas worth exploring
The tangible outcome of the Applied Linguistic Project is an academic paper and a presentation in front of an expert panel at Charles University. The intangible, but perhaps more significant, impact is the insight students gain into their own linguistic situation and learning, as well as the questions that arise from their findings. These trigger classroom discussions and spread outside the EAL classroom: students revisit ALP topics in their work in TOK and IB Language and Literature classes, or expand on them in their Extended Essay. Annually, selected ALP topics are developed into talks presented at the TEDxYouth conference hosted at ISP, reaching an even broader audience.

Beyond the classroom: corpora as administrative tool
Corpora can also prove useful as an assessment or placement tool. Lexical density, lexical richness or vocabulary idiosyncrasies can all help us evaluate the student’s language proficiency level. We can compare a given student’s work from two points in time, tracking his or her language development, compare in-class work to take-home assignments to check for authenticity, or compare the work against exemplars to assist with placement.

For an overview of the individual corpus analysis tools and their potential in assessment, please see the ISP corpus website.

Getting started? 
Corpus linguistics is nothing new. Major publishers now offer textbooks and dictionaries based on corpora, studies abound using corpus tools to explore the rhetoric of mass media and public figures, and software is being developed to enable anyone learn and use corpus analysis. In our work with students so far, we have excellent experience with the following tools:

Available free online

Paid subscription

Custom-made corpora

  • ISP corpus of high school academic texts: more information about the ISP Corpus of High-School Academic Texts, incl. examples of classroom exercises using the corpus
  • AntConc (by Laurence Anthony, to student-created corpora): software allowing for the analysis of your own corpora

To receive basic training in corpus linguistics, we recommend a free online course offered by Lancaster University and Future Learn.

Happy concordancing!

Corpus delecti: empowering student learning with ISP language corpus

In February 2015, the Upper School EAL department started developing ISP’s own language corpus – a collection of electronic texts that allows for data-driven language analysis. The project was inspired by student projects using parallel and monolingual corpora (InterCorp, BNC) carried out in 2013 and 2014 and began in earnest after training received from the Lancaster University Centre for Computer Corpus Research. The unique value of the ISP corpus lies in the understanding of a learner corpus not as the basis for error and L2 interference detection, but rather as a powerful tool to enhance student-led, inquiry-based research into desirable language patterns for a variety of text types at a high school level, with both non-native and native speakers of English as the target audience.

For more information, see the ISP corpus website.

“I am fourteen and I am a published linguist.”


Final student presentation will be held on December 4th, 14:00-17:30 and December 8th, 14:15-17:30 at Charles University, Faculty of Education. Please, see the invitation for more details: ALPinvitation2014.

  • Contact: Robert Bohat; Nina Horakova; Beata Rodlingova

2013-2014 (PILOT) REPORT

This research project confirms that high school students – with appropriate scaffolding – are fully capable of solid academic research, metacognition, and analysis.

ALP all_rb_photo 1

“Mother tongues thrive in truly international schools.”
(Eithne Gallagher, ECIS ESL & MT Committee)

“When children learn language, they are not simply engaging in one kind of learning among many; rather, they are learning the foundation of learning itself. The distinctive characteristic of human learning is that it is a process of making meaning.” (Halliday 1993) In other words: almost all learning is a form of language learning.

Thus, if we understand better how language works and how Mother Tongue (MT) and Academic Language (English) interact, we will be in a position to understand better how learning works – and how to make it more effective. That was the goal of ISP’s Grade 9 EAL students who have done a small-scale linguistic research ‘on themselves’ – examining the relationship between their MT and English language, and how this impacts their learning and communication. This way the students became producers of knowledge, not just its passive consumers. A unique aspect of this learning experience was the fact that the results of this research were submitted in the format of an academic paper, and presented in front of an expert panel at Charles University of Prague. These high school students experienced the way research is done in linguistics, and had a chance to present it in a real academic institution – a true celebration of authentic learning.

(Continued after the insert.)

Watch a summary of the students’ presentations delivered at the Faculty of Education, Charles University, on May 20 and June 5, 2014.

Read selected student papers and see their presentations:

Read more about the project (continued from above):

The ESL & Mother Tongue Committee of The European Council of International Schools (ECIS) aims “to inspire school communities to nurture the linguistic and cultural identities of their students in order to create successful multilingual, multi-literate global citizens.” (ESL and MT Committee) To achieve this, it is important to counteract the tendency in international school students to gradually ”lose” their mother tongue skills. Such “nurturing of their linguistic identity” has many cultural and educational benefits, but it cannot be achieved from the “outside” by command from the teachers or the administration; it must spring from the multilingual students’ own awareness and appreciation of the value of their native languages.

Naturally, multilingualism also has its challenges, but the advantages far outweigh the disadvantages, as the present publication demonstrates. This is especially true in the context of education, where trying to understand an abstract concept in a foreign language is a double challenge as it requires the overcoming of two “unknowns” – the unknown ideas in an unknown language. Adding the mother tongue into the equation removes the “unknown language” element, making the study of the “unknown concepts” smoother, faster, and easier. Having understood the idea, the new academic language vocabulary is typically easier to master. This way the student “keeps in touch” with his or her home cultural context and identity, learns faster, and increases the probability of academic success, not to speak of the abundance of opportunities for metacognition and comparative analysis of terminology, perspectives, biases, etc. These are some of the reflections triggered by ISP students’ research.

defense 2 (2)

“You work like real scientists!”
(Dr. Mark Frankel, ISP US Principal,
to the participating students)

The Applied Linguistics Project was inspired by a mother tongue text activity presented by professor Jim Cummins at the ECIS ESL & MT Conference in Dusseldorf, Germany, in 2011. It harmonizes with our school’s Mission Statement as this is an example of “authentic education” – mimicking how real-life research is done and shared at universities and academic institutions. This project also empowers students to “think critically and creatively” about their language & learning, and promotes “intercultural understanding: valuing and understanding the perspective and origins of other people by actively engaging with their language, culture and history.” (ISP Mission)

What made the event even more special was the fact that three days before the second round of presentations, on Monday, June 2, 2014, the great linguist Noam Chomsky gave a speech in Prague on the topic: “What Can We Understand?” On Thursday, June 5, 2014, a group of young linguists presented their original language research on what we can understand about MT and learning.


“I have never seen these students as proud of their achievements as when they spoke about the Applied Linguistics Project.”
(Nathan Heilmann, Learning Support Specialist at ISP)

The students’ papers were subsequently gathered in a publication, documenting that high school students – with appropriate scaffolding – are fully capable of solid academic research, metacognition, and analysis. The achievement level varies from paper to paper; but overall, these young researchers demonstrate a high quality achievement.   It is indeed true what is often said about EAL students – that one should not “judge” them by what they cannot do but by what they can add to the class experience.

When Latin was the academic language of Europe, all its students were EAL (or rather LAL) students. The great 17th-century Czech educator Jan Amos Komenský (Comenius) wrote: “Translation into the mother tongue will make the process of acquisition of the [academic] Latin language easier and more pleasant.” It is our hope that the results of this research project contribute towards making the learning of Academic English (the modern day Latin) easier and more pleasant.

Róbert Bohát, Nina Horáková, Beata Rödlingová, ISP Upper School EAL teachers

Language Week: reflecting on nine years of celebrating ISP’s cultural and linguistic diversity

The time has come to reflect on an event that brings forth the incredible diversity of cultures and languages among ISP’s students and faculty. Having developed from the customary “International Day” of food and music, Language Week has now become a permanent fixture in the ISP calendar, an event that allows us to discover the deeper layers of the cultural iceberg. Over the years we have seen performances and presentations given by students and teachers as well as guest presenters, exploring the way in which language relates to our identity and shapes our perception of the world and our way of thinking. We had the opportunity to gain an insight into other cultures by learning some of their languages and perhaps discover something new about ourselves.

You can read more about this event in a CEESA Newsletter (Spring 2011) or watch selected parts of past events published in our YouTube channel.

On March 16, 2013, we gave a short presentation on Language Week at CEESA conference in Prague. You can download the handout or watch our presentation.

And here is a brief summary of this year’s Language Week: