On the right-hand side of Fig. 2.1 is a set of tools or methods which can be used singly or in combination to investigate aspects of language use.
While corpus linguistics has always been concerned with the study of large data sets, or corpora, the advent of computer technology has enabled work to be carried out on millions of items quickly and accurately, and corpus analysis can now be undertaken by all researchers with computer access through accessible software programmes. The key benefit of corpus approaches to language is that large-scale analysis can uncover patterns of language use that are not easy to see with the human eye. Some of these prove to be counterintuitive, and challenge what we believe we see from experience or instinct. In this way, quantitative studies can provide fresh insights as well as locating our qualitative observations within patterns we can demonstrate. Corpora are of three main types—general linguistic corpora (such as the British National Corpus), specific linguistic corpora which contain examples from particular genres of language (for example, journal articles or medical writing) and self-compiled corpora (of which my set of BP texts could be an example). Evidence from specific or self-compiled corpora can be compared with that from general linguistic corpora to indicate whether a particular pattern of use conforms to a general norm, or appears to be somehow different. This kind of work is often carried out to identify language differences between registers and genres, for example Biber, Johansson, Leech, Conrad and Finegans (1999) study mentioned already of four types of language use: conversation, academic, fiction and newspaper language. Biber et al. proposed clusters of features which differentiated the four registers, placing them along six dimensions (involved vs information, primarily narrative and primarily non-narrative and so on). Biber et al.’s work highlights not only that it is sets of features co-occurring rather than individual features which can differentiate different text types, but also that such differences are tendential rather than binary absolutes. Common patterns of language use easily obtained using corpus tools include frequencies of words (significant in comparison with a norm), concordance lists of words (lists which show a word of interest, or keyword, with a number of words either side) and collocations (the most common words which co-occur with the keyword). Such quantitative output of a corpus analysis requires qualitative interpretation of what findings are significant for the research question and why. Corpus scholars also make a distinction between research which is corpus-driven and corpus-based. In corpus-driven research the findings are emergent; for example, certain words or word classes prove to be significantly more represented than predicted by general corpus norms. In corpus-based research, particular terms, words or grammatical features are investigated according to a research question or hypothesis.
Corpus linguistics is a field of study in its own right, but such analysis is a tool which can be used in multi-method linguistic work. It has been used both in constructivist and more positivist paradigms as a robust starting point for other, qualitative, analysis. For the purposes of the BP research I chose to ground my qualitative findings in quantitative work of a different kind—a quantitative overview of the data set and feature counting by hand—but a corpus approach would have been possible and potentially effective in scoping out the broad traits of my data.