Corpus analysis

As spoken workplace corpora are usually small and specialised (see Section 3), corpus analysis tends to be used in combination with qualitative methods, such as genre analysis or discourse analysis. Even with relatively small corpora, the quantification carried out by corpus tools, such as frequency lists, keyword lists and con- cordancing, can provide unique insights into the data that supplement, or may even provide a starting point for qualitative analysis, as will be illustrated in this section. As corpus linguistic approaches are dealt with in a separate chapter in this volume, corpus methods will not be reviewed in detail here; instead some key studies of spoken workplace corpora will be reviewed with examples of how corpus methods have been applied and what insights into the data have been gained (see also Koester 2010 for a more detailed discussion of corpus methods for workplace discourse).

Handford’s (2010) study of the 1-million-word Cambridge and Nottingham Business English Corpus (CANBEC) has provided some key insights into the nature of spoken workplace and business discourse. Comparisons of the most frequent and key words in CANBEC with a corpus of general spoken English revealed not only that typical “business” nouns, such as customer(s), sales or order are more frequent in business contexts, but also, more surprisingly, that certain grammatical words are too. Thus we is the top keyword, which means it is unusually frequent in business compared to its general frequency in the language. This reflects the emphasis in business and workplace contexts on the group, i.e. the company or organization, rather than the individual. The discourse markers if and so are also among the top 50 keywords, indexing, as Handford argues, key discursive practices such as hypothesising and summarising. Other keywords, including need, issue and problem, can be linked to the prevalence of the genre of problem-solving/decision- making, as already discussed in Section 4.1 above.

Of course, studying frequency or keyword lists only provides a very limited view of the discourse. Handford (2010) also identified the most frequent phrasal strings or “clusters” (also called “lexical bundles” or “chunks”) occurring in the corpus. This revealed that many of the high frequency words and keywords occurred mainly in such clusters of two or more words, which partly accounted for their frequent occurrence. Handford categorised these clusters of two to six words into two broad functional categories: discourse marking and interactional functions. Discourse marking categories include such functions as focusing, linking and summarizing; interactional categories include a wide variety of functions, such as checking understanding, hedging, hypothesizing and evaluating. Examples of clusters in some of these functional categories are:

  • - Summarising/Reformulating: so I think, so I mean, in other words;
  • - Signalling obligation: we need to, we should, I think you should;
  • - Hypothesizing/speculating: if you, so if, if I.

Handford (2010: 29-33) interprets the functions performed by such frequently occurring clusters as “discursive practices”, that is, conventionalised ways of achieving particular goals within a social or professional context. Thereby, Handford argues, the textual, interaction-order level at which corpus analysis operates can be linked back to broader institutional-order concerns relating to the social and institutional context.

While CANBEC is a corpus consisting mainly of UK-based business meetings involving English native speakers, the business sub-corpus of the Hong Kong Corpus of Spoken English (HKCSE-bus) covers a wider range of workplace contexts and activities, and involves communication in international and intercultural environments. A number of linguistic and pragmatic features have been examined in HKCSE-bus, including a range of speech acts, such as agreeing/disagreeing, expressing opinions (Cheng and Warren 2005, 2006), vague language (Cheng 2007) and language used to check understanding (Cheng and Warren 2007).

One site in which data was collected for the corpus was the reception area of hotels - a key site for cross-cultural interactions in an international city like Hong Kong. Cheng (2004) studied interactions between hotel staff working at the front desk and guests who were checking out. Her study is a good example of how linguistically minimal corpus findings, such as a frequency or keyword list, can provide a “systematic point of entry” (Adolphs et al. 2004:12) into the data, which then opens up a path for further investigation. A frequency list generated with corpus software showed that the lexical item minibar was unexpectedly frequent and occurred in all the checking out interactions. The next step was to create concordances for the most frequent items. Cheng (2004: 145) notes that, with spoken corpora, concordances are useful not only for revealing common collocations, but also to show whether any of the items are used predominantly by one particular speaker. In service encounters such as hotel front desk interactions, there is a clear role distinction between the participants: certain words will be used more, or even exclusively, by hotel staff, for example sir/madam. Interestingly, the word minibar was used exclusively by hotel staff and never by a guest. This led to a qualitative examination of the discourse context of minibar, ranging from a study of politeness features used with this lexical item, through the intonation patterns in which the word occurred, to the positioning of the lexical item within the structural organisation of the discourse. The findings from this analysis were that the hotel’s corporate message of “customer care” was frequently at odds with what actually happened in checking out discourses, in particular the way receptionists formulated questions about use of the minibar.

Cheng’s (2004) investigation of the word minibar is a good example of a study that uses a corpus-driven approach combining quantitative with qualitative methods to analyse the characteristics of workplace discourse. In such an approach, the initial quantitative findings obtained using corpus methods (such as keywords and concordancing) reveal features of the data which are then further explored using qualitative methods. Furthermore, Cheng identifies the need to train staff in certain aspects of the checking out discourse, so the study has clear applications for professional training within the industry investigated.

While HKCSE-bus includes discourse from both L1 and L2 speakers, Handford and Matous (2011, 2015) compiled a small corpus of exclusively L2 interactions from the international construction industry. The construction project used for the study was a Japanese-Hong Kong joint venture, where spoken interactions involved Japanese engineers leading the project interacting with foremen on the site itself and in the site office. The lingua franca and multi-cultural nature of the discourse makes it particularly interesting to study. The interactions in the corpus are mainly between two Japanese engineers and two Hong Kong foremen. As the corpus was very small (consisting of only 12,000 words), the data were systematically compared with CANBEC, and similar features (keywords and clusters) were examined. Despite the fact that both the professional context and the L1 of the speakers in these two corpora are quite different, both keyword lists and frequent clusters were surprisingly similar; for example, signalling obligation (e.g., we need to, we have to) and hedging (e.g., I think, I don’t know) were highly frequent in the construction corpus, as well as in CANBEC (as already discussed above). From their findings, Handford and Matous (2011: 92) conclude that five interpersonal language categories previously identified by Handford (2010) as indexing specific social dimensions and discursive practices in business meetings also indexed these categories and practices in the construction site corpus:

  • - Pronouns: signalling the social relationship;
  • - Backchannels: signalling listener solidarity;
  • - Vague language: signalling solidarity over knowledge;
  • - Hedges: negotiating power over knowledge;
  • - Deontic modality: negotiating power over actions.

Extract 2 from the corpus illustrates the use of a number of these interpersonal features by a Hong Kong foreman, TT, talking to a Japanese engineer, Arai:

Extract 2

TT: I think + er we don’t have to consider about the safety + in here

Arai: Hmm (Handford 2014: 373)

Here TT combines hedges (I think, er) with deontic modality (we don’t have to); interestingly not using the far more face-threatening alternative you must (also rarely used in CANBEC) (Handford 2014: 373). Thus similar interpersonal concerns seem to be addressed with the same discursive practices across different institutional and professional settings, suggesting that such interpersonal language may be typically “institutional”. One area in which the two corpora diverged, however, was in the much higher frequency of place deictics in the construction corpus, with the demonstratives this and here identified as keywords. This reflects the importance of visual and non-verbal communication, for example through drawings and gestures, in the engineering profession.

The studies of workplace corpora discussed in this section have shown that corpus methods do not only provide powerful tools for identifying the linguistic features that characterise workplace discourse in general. They are also extremely useful in establishing the “generic fingerprint” (Farr 2007) of specific workplace genres and activities, and in generating insights into professional practices and thus broader institutional and socio-cultural concerns.

< Prev   CONTENTS   Source   Next >