How to recognize chunks: the segmentation operations
The hypothesis of a delayed evaluation in language processing not only relies on a specific organization of the memory, but also requires a mechanism for the identification of the elements to be stored in the buffer. Two important questions are to be answered here: what is the nature of these elements, and how can they be identified. Our hypothesis relies on the idea that no deep and precise linguistic analysis is done at a first stage. If so, the question is to explain and describe the mechanisms, necessarily at a low level, for the identification of the stored elements.
These questions are more generally related to the general problem of segmentation. Given an input flow (e.g. connected speech), what types of element can be isolated and how? Some mechanisms, specific to the audio signal, are at work in speech segmentation. Many works addressing this question ([MAT 05], [GOY 10], [NEW 11], [END 10]) exhibit different cues, at different levels, that are used in particular (but not only) for word segmentation tasks, among which:
- - Prosodic level: stress, duration and pitch information can be associated in some languages with specific positions in the word (e.g. initial or final), helping in detecting the word boundaries.
- - Allophonic level: phonemes are variable and their realization can depend on their position within words.
- - Phonotactic level: constraints on the ordering of the phonemes, which gives information about the likelihood that a given phoneme is adjacent to another one within and between words.
- - Statistical/distributional properties: transitional probabilities between consecutive syllables.
Word segmentation results from the satisfaction of multiple constraints encoding different types of information, such as phonetic, phonological, lexical, prosodic, syntactic, semantic, etc. (see [MCQ 10]). However, most of these segmentation cues are at a low level and do not involve an actual lexical access. In this perspective, what is interesting is that some segmentation mechanisms are not dependent on the notion of word and then can also be used in other tasks than word segmentation. This is very important because the notion of word is not always relevant (because involving rather high-level features, including semantic ones). In many cases, other types of segmentations are used, without involving the notion of words, but staying at the identification of larger segments (e.g. prosodic units), without entering into a deep linguistic analysis.
At a higher level, [DEH 15] has proposed isolating five mechanisms making it possible to identify sequence knowledge:
- - Transition and timing knowledge: when presenting a sequence of items (of any nature), at a certain pace, the transition between two items is anticipated thanks to the approximate timing of the next item.
- - Chunking: contiguous items can be grouped into the same unit, thanks to the identification of certain regularities. A chunk is simply defined here in terms of a set of contiguous items that frequently co-occur and then can be encoded as a single unit.
- - Ordinal knowledge: a recurrent linear order, independently of any timing, constitutes information for the identification of an element and its position.
- - Algebraic patterns: when several items have an internal regular pattern, their identification can be done thanks to this information.
- - Nested tree structures generated by symbolic rules: identification of a complex structure, gathering several items into a unique element (typically a phrase).
What is important in these sequence identification systems (at least the first four of them) is the fact that they apply to any type of information and rely on low-level mechanisms, based on the detection of regularities and when possible their frequency. When applied to language, these systems explain how syllables, patterns or groups can be identified directly. For example, algebraic patterns are specific to a certain construction such as in the following example, taken from a spoken language corpus: “Monday, washing, Tuesday, ironing, Wednesday, rest”. In this case, without any syntactic or high-level processing, and thanks to the regularity of the pattern /date - action/, it is possible to segment the three subsequences and group them into a unique general one. In this case, a very basic mechanism, pattern identification, offers the possibility to identify a construction (and access directly to its meaning).
When putting together the different mechanisms described in this section, we obtain a strong set of parameters that offer the possibility of segmenting the input into units. In some cases, when cues are converging enough, the segments can be words. In other cases, they are larger units. For example, long breaks (higher than 200ms) are a universal segmentation constraint in prosody: two such breaks identify the boundaries of a segment (that can correspond to a prosodic unit).
As a result, we can conclude that several basic mechanisms, which do not involve deep analysis, make it possible to segment the linguistic input, be it read or heard. Our hypothesis is that these segments are the basic units stored initially in the buffers. When possible, the stored units are words, but not necessarily. In the general case, they are sequences of characters or phonemes that can be retrieved later. This is what occurs when hearing a speaker without understanding: the audio segment is stored and accessed later when other sources of information (e.g. the context) become available and make it possible to refine the segmentation into words.