Corpora and corpus linguistic approaches to studying business language
- 1. Introduction
- 2. Corpus linguistics: principles and practice
- 3. Corpora of business language
- 4. Corpus research on business language
- 5. Conclusion
Corpus linguistics is primarily concerned with studying language on the basis of large collections of real-life linguistic data normally known as corpora. The term corpus has been used in linguistics generally for some time to describe a sample of a language or language variety. However, in corpus linguistics, the term is understood more specifically as a compilation of naturally-occurring texts stored electronically and available for quantitative and qualitative analysis (McEnery and Hardie 2012). The development of corpus linguistics has been largely fuelled by advances in computer technology and the availability of linguistic software that allows linguists to search through corpora rapidly and reliably. Insights derived in this way have significantly increased our understanding of language use by providing empirical evidence for the existence of regularities and patterns that are not immediately visible to the naked eye or simply defy linguists’ intuition. As John Sinclair, the father of corpus linguistics, pointedly remarked: “The language looks rather different when you look at a lot of it at once” (Sinclair 1991:100).
Most work in corpus linguistics, particularly in the early stages, was concerned with the development and study of large reference corpora of national varieties including spoken and written registers. The British National Corpus (BNC), which contains 100 million words of English, is a good example of such large compilations, sometimes also referred to as mega-corpora (Flowerdew 2004). Currently, in the era of Big Data, we are in a phase of giga-corpora with compilations reaching billions of tokens such as the enTenTen12 corpus of English (4.65 billion tokens) available on Sketch Engine, a web-based corpus linguistic software (Kilgarriff et al. 2004). Yet, while large reference corpora have proven invaluable in linguistic analyses, they are less suitable for investigating language use in specific professional domains.
This is largely due to the fact that they are compiled with a view to being a representative sample of a language, and their composition is carefully balanced to include texts and genres that are seen as important in a particular culture (Flowerdew 2004). Hence, text types or genres that are typical only of a specific domain are likely to be excluded from such compilations. Also, certain texts may not be included because they belong to the category of occluded genres that are, for example, difficult to access due to confidentiality or intellectual property concerns. Hence, researchers who are interested in studying language patterns in specific contexts are not much helped by large reference corpora.
It is for this reason that, since the mid-1990s, many scholars have started using the tools and techniques of corpus linguistics to build and interrogate smaller specialised corpora focusing on selected genres, registers and domains. The impetus for this kind of work, perhaps not surprisingly, came from the fields of English for Specific Purposes (ESP) and English for Academic Purposes (EAP), where, given the growing importance of English in professional and academic communication globally, there was an urgent need to create teaching resources (Tribble 1997, 2002; Flowerdew 1998; Ghadessy, Henry, and Roseberry 2001). It is within this realm that the first corpora of business language were created, leading to corpus-based research on language use in a variety of business genres, registers and domains.
Before discussing the main corpus resources of business language and the ways in which corpus-based research has enhanced our understanding of business communication, Section 2 outlines the key analytical tools and procedures commonly adopted in corpus research. These include: frequency, concordancing, collocation, keyword, cluster, and corpus annotation. To demonstrate the benefits and also limitations of these analytical tools, examples will be drawn from CORES, a one-million- word corpus of Corporate Social Responsibility (CSR) reports obtained from ten major oil companies and published over the last five years. CSR reports form part of corporate disclosure, and are the most public and visible documents offering insights into organisations’ actions and goals in relation to their role in society and their stakeholders. Issues involved in building such specialised corpora, including those of business language, have been extensively discussed elsewhere (Warren 2004; O’Keefe, McCarthy, and Carter 2007; Koester 2010) and for reasons of space will not be considered here. Section 3 outlines the main corpus resources of business language compiled to date, while Section 4 reports on the major studies that have used corpus linguistic tools and methods to investigate aspects of business language. The final section summarises the contribution of corpus research to business language and discusses areas for further research.
-  For further information, see the official BNC website at: http://www.natcorp.ox.ac.uk/ DOI 10.1515/9781614514862-024