Pre-conference workshops


Wednesday 11 July 2012

morning session
9 a.m.-12 p.m.

Workshop 1
An introduction to developing and using micro-corpora in language education
Christopher Tribble
Workshop 2
On-line correction systems: concept and illustration
Susan Verhulst, Ariane Ruyffelaert

afternoon session
2 p.m. – 5 p.m.

Workshop 3
Word Sketches – the structured company that words keep
James Thomas
Workshop 4
Multi-layered Annotation of Learner Corpora for Analysis of Lexis, Grammar and Discourse
Huaqing Hong, Yukio Tono





Workshop 1
An introduction to developing and using micro-corpora in language education
Dr Christopher Tribble, King’s College London, U.K.

Corpus applications in language education are often associated with large scale corpus projects such as the British National Corpus (2001), or the Corpus of Contemporary American English (Davies, 2010).  However, while these large corpora have been invaluable for the elaboration of lexicographic and grammatical accounts of language, they have been found problematic for many language learning and language teaching applications as they often provide either too much and too complex material, or they offer too little that is relevant to the needs of specific groups of learners.

A response to this concern can be found in the development of small or specialist corpora (Tribble, 1997; Ghadessy & Roseberry, 2001;  Nesi & Garner, 2012), and their exploitation for pedagogic purposes.  Through the analysis of such small corpora, it is possible for teachers to begin to develop curriculum specifications for ESP/EAP courses, and to develop supplementary materials to support learners on specialist programmes or on general programmes where there is a need to support the expansion students’ knowledge of an ability to use the grammar and lexis of a language.

In this workshop, you will have the opportunity to develop your own pedagogic corpus and to develop learning / teaching materials for classroom purposes.  No previous experience of classroom applications of corpora is required, but it will be important to bring with you an idea of the kinds of students you wish to support, and, if possible, to bring a collection of texts which can be worked on during the session

Participants should, ideally, bring with them a collection of electronic  texts which can be used as a micro corpus.  These might include:
•    examples of student writing
•    collections of specialist texts (e.g. research articles, administrative documents, informational documents etc.)
•    print journalism
•    fiction texts
If you are not able to bring a collection yourself I will be able to provide a collection of UK and US journalism (good for advanced general  English learners, a Fiction collection (surprisingly good for intermediate learners), and a collection of science and social science research articles (good for advanced EAP).
Participants should bring their own USB drive (at least 2G available storage)

We will provide learners with a Windows computer with Wordsmith Tools v6 (commercial software) and AntConc (freely available) installed, along with basic Office applications (Word / Excel).

By the end of the 3 hour workshop, participants will be able to generate wordlists, ngram lists and edited concordances which can be used as the basis for classroom materials.

The British National Corpus, version 2 (BNC World). (2001). Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL:
Davies, Mark (2010). “The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English”. Literary and Linguistic Computing 25 (4): 447–65
Ghadessy, M., Henry, A. and R. Roseberry (eds.), (2001).  Small corpus studies and ELT ,   Amsterdam / Philadelphia:  John Benjamins.
Nesi, H. & S. Gardner (2012).  Genres across the disciplines: student writing in Higher Education ,   Cambridge: Cambridge University Press.
Tribble, C.,  (1997). Improvising corpora for ELT: quick-and-dirty ways of developing corpora for language teaching.In Melia, J. and B. Lewandowska-Tomaszczyk (ed.) PALC 97: practical applications in language corpora ,   Lodz: Lodz University Press.





Workshop 2
On-line correction systems: concept and illustration
Susan Verhulst, Ariane Ruyffelaert, Ghent University, Belgium

This workshop aims to demonstrate the importance of a standardized correction tool for language teachers and students and to provide the participants with a concrete example of such a tool.

1. Importance of a standardized correction tool
In the field of traditional foreign language teaching in general, and writing skills in particular, teachers are for various reasons fairly often limited to analyzing a model text in order to show students the characteristics of a particular text genre, such as the essay, dissertation, report, letter, literature review, and the likes. This in itself reveals the need for well-structured digital set of screened and tagged reference corpora which a student can consult during his/her own individualized learning process. As, especially in the case of L2 at an intermediate or advanced level, the discrepancy between receptive and productive language skills can only be remedied by more practice in oral and writing skills, there is a growing need for a more efficient, uniform and, hence, objective correction method. Also, there is an increasing necessity to digitally archive these learner texts and to find means of disclosing these learner corpora.

1.1 Correction tool: added value for teachers
Teachers can detect the problem areas more quickly and as such offer tailored teaching, and individual feedback, easily guiding students to one or more webpages with more theoretical information on the matter. They can furthermore collect common mistakes far more easily while correcting. Finally, the error corpus offers much data for e.g. comparative research.

1.2 Correction tool: added value for students
Students consult their corrected assignments online, so these are available anytime & anywhere. Each error can be directly linked to one or more relevant theoretical webpages. Through self-reflection and automated advice (upcoming) students can detect problems more quickly and start a remedial learning path.

2. Example
The project Corpuscript aims to make the existing theoretical writing environment (supported languages are Dutch, German, English, French, Italian, Spanish and Swedish) more efficient through the implementation of an online correction tool, equally called Corpuscript. In the hands-on part of our workshop we guide you through our online correction platform in order to experience its different features (such as making assignments, correcting, making an error corpus, etc.) and to learn what a difference it can make in your own teaching practice. The road to Corpuscript as it exists nowadays has been an interesting exploration of linguistic questions, methodological issues and, of course, technological matters, all of which we will gladly address in our workshop.

Participants are requested to bring some samples of student writing (students’ written assignments preferably in English or in French) that they would like to correct during the workshop.





Workshop 3

Word Sketches – the structured company that words keep
James Thomas, Masaryk University, Czech Republic

This 3 hour workshop is primarily concerned with the exploitation of several tools that surpass collocation listings by returning collocate tables containing columns of words separated not only by part of speech but by syntactic role. For example, when studying a particular noun, a list of verbs of which it is subject is in a different column from those in which it is object. Attributive and predicative adjectives are in separate columns, as are pre and post nouns.

Originally designed for lexicographical purposes, it was quickly found to be of value to anyone studying vocabulary for academic and pedagogical purposes. Such listings can be exploited in ways very different from lists of collocates.

Some of these tools use the BNC but as this workshop demonstrates, these so-called Word Sketches can also be generated on home-grown corpora. Once you have created your own corpus from the web or your own uploaded documents within the Sketch Engine, its Word Sketches are generated automatically for analysis and practical use.

The workshop will also demonstrate a related tool that generates Word Sketches for pairs of words. A single page displays the similarities and differences in the company that the two words keep.

Some of these tools are freely available, others are paid for.





Workshop 4
Multi-layered Annotation of Learner Corpora for Analysis of Lexis, Grammar and Discourse
Dr. Huaqing Hong, Nanyang Technological University, Singapore
Dr. Yukio Tono, Tokyo University of Foreign Studies, Japan

In this workshop, we’ll present how multiple-layered annotation can be adopted to the study of lexical, grammatical and discourse features in learner corpora.  Taking the recently-completed International Corpus of Crosslinguistic Interlanguage (ICCI) as an example, we will introduce the approach and rationale for multiple-layered annotation of student writings, the data manipulation protocols designed to process corpus data, the in-house-built toolkit developed for sophisticated annotation, and the robust offline and online query package providing reliable statistical results of the annotation. A hands-on session will allow participants and attendees to try the tools by making use of sample data and their own data. This workshop is thus proposed to provide the opportunity for participants to know:

  • Why and when is a multiple-layered annotation necessary?
  • What linguistic and discourse features can be annotated?
  • What computer tools can be used to do the multiple-layered analysis?
  • How can multiple-layered annotation and discourse analysis be done?
  • In what way the annotation results can be quickly retrieved and queried?
  • What the advantages and disadvantage of using the toolkit for multiple-layered annotation?

More importantly, the workshop will encourage the participants to interact with the ICCI research team members. The hands-on exercises will benefit the participants on how to make use of the multiple-layered annotation toolkit to explore their own data.

Participants are required to bring their corpus data to the workshop. Sample data and tools will be distributed in the workshop.