Assumptions concerning the building of a tool assisting the writing process
As mentioned already, authors tend to use different strategies when writing: they start from topics or goals (top-down), from initially unrelated data or ideas (bottom-up), or they combine these two strategies. Bottom-up activated ideas lead to the recognition of a subsumption category (Figure 7.3(b)), which in turn causes the activation of more data (top-down again, see Figure 7.3(c)).

Figure 7.3. Three strategies of discourse planning: top-down, bottom-up or both
The first strategy is probably the most frequent one. Starting from a goal, authors seek relevant content (messages), organize it according to topical and rhetorical criteria, translate it into language and revise it. This is known as top-down planning. Note that revision can take place at any stage, and that, during the initial stage of conceptual planning (brainstorming), little filtering takes place. Authors are mainly keen on potentially interesting ideas. (Incidentally, this is why the term “brainstorming” better captures the reality of this situation than “idea planning”.) It is only at the next step that contents are thoroughly examined. This may lead to a modification of the message base: some ideas will be dropped, others added. The result of this is generally a so-called outline or text plan, i.e. a definition of what is to be expressed when, i.e. in what order.
Another strategy involves going the opposite way. Starting from the ideas coming spontaneously to the authors’ mind (bottom-up planning), s/he will try to group them into sets (topical trees) and to link these clusters. In this kind of bottom-up planning, the structure or topic emerges from the data. These topics may act as seeds, eventually triggering additional material (mixed strategy). Bottom-up planning is a very difficult problem (even for people). Yet, this is the one we are interested in. A question remains on the basis of what knowledge writers know which ideas cohere, i.e. what goes with what and in what specific way? Suppose you have an assignment asking you to write a small document about “foxes” and their similarities and differences compared with “wolves” or “coyotes”, two animals with which they are sometimes confused. This might trigger search for information concerning “foxes”, possibly yielding a set of messages like the one shown in Figure 7.3(a). For the time being, it does not really matter where these ideas come from (author’s brain, external resources or others), what we are interested in here is to find an answer to the following questions: (a) How does the author group these messages or ideas into topical categories? (b) How does s/he order them within each group? (c) How does s/he link and name these chunks? (d) How does s/he discover and name the relations between each sentence or chunk?
We will focus here only on the first question (topical clustering), assuming that (a) messages will be grouped if they have something in common, and (b) messages or message elements do indeed have something in common. The question is how to show this. Actually, this can be either hard or fairly trivial, as in the case of term identity. Imagine the following inputs: (a) give (I, dogi, my_son) and (b) like_to_chase (dogi, milk-man). Since these two propositions share an argument (dog1), they can be clustered, yielding two independent clauses (I’ve given my son a dog. He likes to chase the milk-man.), or a subordinate, i.e. relative clause (I’ve given my son a dog who likes to chase the milkman). Of course, you could also topicalize the “dog”, producing the following sentence: “The dog I’ve given to my son likes to chase the milkman”. Which of these forms is the most adequate one depends, among other things, on the context (surrounding sentences) and the discourse goal: what do you want to stress or highlight?
The question of how to reveal commonalities or links between data in the non-obvious cases remains. We can think of several methods. For example, we could try to enrich input elements by adding information (features, attribute-values, etc.) coming from external knowledge sources: corpora (cooccurrence data, word associations), dictionaries (definitions), etc. Another method could consist of determining similarities between message elements (words). This is the one we have used, and we will explain it in more depth here below (section 7.6). Once such a method has been applied, we should be able to cluster messages by category, even though we may not be able to give it a name. The name may be implicit, and name-giving may require other methods.
The result of this will be one or several topic trees, grouping (ideally) all inputs. While different trees may achieve different rhetorical goals (the focus being different), all of them ensure coherent discourse. The effect of these variances can probably only be judged by a human user, who shall pick the one best fitting his or her needs. While our developed software will not be able to achieve this goal, i.e. build a structure that conceptually and rhetorically matches the authors’ goals, it should nevertheless be able to help the user perceive conceptual coherence, hence allow him to create a structure (topic tree) where all messages cohere, something that not all grown-up human beings are able to do. Concerning goals and bottom-up planning, consider the following.
Goals can be of various sorts. They can be coarse grained (“convince your father to lend you his car”) or more fine-grained, relating to a specific topic: describe an animal and show how it differs from another one with which it is often confused (alligator-crocodile; fox-coyote/wolf). Messages may feed back on the conceptual component, altering messages or goals (addition, deletion, modification). This cyclic process between top-down and bottom-up processing is a very frequent case in human writing. We will focus here only on the latter, confining ourselves to propositions composed of two place predicates. These will be the inputs for which we try to check whether there is a commonality or link between them. Of course, even the linking of simple propositions may be a very complex problem. Think of causal relations that can be viewed as a systematic correlation between two events, or a state and an event[1]. Since these cases require a special approach, we will not deal with them here.
- [1] The perception of the causal relationship between the underlined elements in - “Be careful,the road may be dangerous. They’ve just announced a Typhoon.” - supposes that we knowthat Typhoons are dangerous.