The Oxford English Corpus
The Oxford English Corpus is based mainly on material collected from pages on the World Wide Web (some printed texts, such as academic journals, have been used to supplement certain subject areas). It represents all types of English, from literary novels and specialist journals to everyday newspapers and magazines, and even the language of blogs, emails, and social media. And, as English is a global language, the Oxford English Corpus contains language from all parts of the world – not only from the UK and the United States but also from Ireland, Australia, New Zealand, the Caribbean, Canada, India, Singapore, and South Africa.
The extensive use of web pages has allowed us to build a corpus of unprecedented scale and variety – the corpus contains nearly 2.5 billion words of real 21st-century English, with new text being continuously collected.
As the corpus develops and more text is added, it becomes possible to trace language change over time: words becoming more or less common, features spreading from one region to another, and the emergence of new meanings.
Oxford University Press grants research access to the Corpus for academic projects that can demonstrate a strong practical need for this data. To apply for research access to the Corpus, please fill in and return the application form.
We take a look at several popular, though confusing, punctuation marks.