Related work

Let us start by taking a look at the work done by computational linguists working on text generation [REI 00, BAT 16]. Their ambition consists of the automatic production of texts based on messages and goals[1]. Since everyone seems to agree with the fact that texts are structured [MAN 87], this seems the right place to go. Alas, even there one will be disappointed. To avoid misunderstandings, the work produced by this community is important and impressive in many ways. Nevertheless, it seems to be based on assumptions incompatible with respect to our goal, which is to assist a writer in text production, i.e. help her or him to organize a set of ideas that prior to that point were a more or less random bunch of thoughts (at least for the reader).

Here are some of the reasons why we believe that this kind of work is not compatible with our goal. First of all, interactive generation (our case) is quite different from automatic text generation. Next, most text generators are based on assumptions that hardly apply in normal writing: (a) all the messages to be included in the final document are available at the very moment of building the text plan [HOV 91]; (b) ideas are retrieved after a text plan has been determined [MCK 85], or the two are done more or less in parallel [MOO 93]; (c) the links between ideas (messages) or the topics to be addressed are all known at the onset of building the text plan. This last point applies both to Marcu’s work [MAR 97] and to data-based generators [REI 95].

Practically all these premises can be challenged, and none of them accounts for the psycholinguistic reality of composition, i.e. text production by human beings [DEB 84, BER 87, AND 96]. For example, authors often do not know the kind of links holding between ideas[2], neither do they always know the topical category of a given message[3]. Both have to be inferred. Authors have to discover the link(s) between messages and the nature of the topical category to which a message or a set of messages belongs. Both tasks are complex, requiring a lot of practice before leading to the skill of good writing (coherent and cohesive discourse).

The above-mentioned work also fails to model the dynamic interaction between idea generation (messages) and text structure, [SIM 88] being arguably an exception. Indeed, a topic may trigger a set of ideas (top-down generation), just as ideas may evoke a certain topic (bottom-up), and of course, the two can be combined, a bottom-up strategy being followed by a top-down strategy (see Figure 7.3). This kind of interaction often occurs in spontaneous writing where ideas lead to the recognition of a topical category, which in turn leads to the generation of new data of the same kind. Hence, ideas or messages may have to be dropped. Not having enough conceptual material, the author may decide either not to mention a given fragment, to put it in a footnote, or to continue searching for additional material.

Another community interested in writing is that of psychologists. Clearly, a lot has been written on this subject[4]. Yet, despite the vast literature on composition and despite the recognition of the paramount role played by idea structuring (outlining) for yielding readable prose, little has been produced to clarify what it takes concretely speaking to achieve this goal (i.e. to help authors). Even the book series “Studies in Writing” [RIJ 96b][5] will tell you next to nothing concerning the topic we are interested in: how to find commonalities between conceptual fragments (ideas) to group them into chunks, or, how to “see” the hidden links between ideas.

In the remainder of this paper, we will present a small prototype trying to emulate the first strategy mentioned here above: to structure data, or discover potential structures in data (messages). Yet before doing so, we would like to spell out in more detail some of the assumptions underlying our work and show how they relate to what is known about the natural writing process.

  • [1] This is often seen as a top-down process : goals triggering ideas, i.e. messages, whichtrigger words, which are inserted in some sentence frame, to be adjusted morphologically, etc.
  • [2] The following two messages [(a) get married (x), (b) become pregnant (x)] could beconsidered as a cause, consequence or as a natural sequence.
  • [3] What we mean by topic is the following. Suppose you were to write “foxes hide underground”. In this case, a reader may conclude that you try to convey something concerning thefoxes’ “habits” (hide) or “habitat” (underground).
  • [4] Among others: [ALA 01, BER 87, FLO 80, KEL 99, LEV 13, MAT 87, OLI 01, RIJ 96a,RIJ 96b, TOR 99]. For more pointers, see:
  • [5]
< Prev   CONTENTS   Source   Next >