Log in / Register
Home arrow Economics arrow Collective Intelligence and Digital Archives: Towards Knowledge Ecosystems

Limits and perspectives

The tools we have just presented offer numerous advantages, but they do not solve the fundamental issue concerning the exploitation of digital archives in social science research from the perspective of both their relevance and the epistemological consequences of their uses.

Epistemological conflicts

In light of these experiments, we observe that the dialogue between the “computer scientists” and the “humanists” still remains difficult. If the former has a tendency to implement solutions and methods showing the power of machines, the latter sees in this power the means to free themselves from the wearisome tasks necessary for exploring their corpus and finding the proof to confirm their hypotheses as fast as possible. With the risk of fatal disenchantments when they realize that the machine only updates already- known data, the “qualitative” expertise of a specialist often compensates, in fact, for the more empirical character of his approach. From there, the risk of turning away from approaches that have been accosted by the bias of tools that are inadequate because they were initially thought up for quantitative disciplines. The same Biolographes team was thus able to test Treeclouds[1] [GAM 14], a collocation visualization model in the form of a very seductive tree, originating from genomics where “language” sequences are bereft of connotation and avoid the complexities of discourse pragmatics. The results were, unsurprisingly, rather disappointing, applied to literary texts well understood by the researchers. If the Treeclouds are ill-adapted to literature, this is not necessarily the case for critical articles, and their use can instead be considered and will be tested in the aforementioned Delille project.

It should be noted that the impression of limited Treecloud productivity for their project, while widely shared, did not arouse formalized criticism from the participants, which says a lot about the sidestep strategies in multidisciplinary research groups and their reluctance to appear reactionary in the face of the novelty proposed (and socially personified, etc.) by computer scientists. SHS researchers are also wary of the “positivism” of tool-using approaches; they also feel that they have more to lose than to gain by engaging themselves in processes that they have not completely mastered. In most cases, they will have to rely on the skills of “others”, whose language they do not at all understand and whose quality and validity they cannot evaluate with certainty in their own field. This hesitation is easily seen when the researchers see each other proposing computerized approaches based on very different epistemological presuppositions, and we have seen recurring debates between supporters of TAL and TIC, one group valuing the tool and the infallible character of complicated systematic counts (but with what utility?), the others demanding that they all rely on the researchers’ expertise to develop light, pragmatic approaches (not thorough in the eyes of the former). Among the two, the literary specialists were tossed between a tedious approach in their eyes, but one crowned by a scientism ventured on the machine, and a more mixed and therefore accessible approach, but one rendered suspect by its efforts towards openness and communication (the seduction of the image against the precision of the number!). The indecision about means of evaluation, when we leave its skill area, can be anxiogenic and cause them to wait for others to open the best routes. Furthermore, it is a more important criterion of scientific identity than vigilance in the face of its own field of expertise: in a world of research largely structured by CNU disciplinary sections and the obligation to be specialized in order to lead a viable career, despite all the official appeals for interdisciplinarity, a good researcher must hesitate to work on projects that do fall within his field of expertise.

However, certain approaches like the archive representations we mentioned allow for an expansion of its field of analysis without falling into speculation, and it is undoubtedly one of the most striking contributions of digital archives. In fact, one researcher is familiar with his corpus, but by leaning on well-exploited archives, he can find points of reference to propose a hypothesis by going outside his primary field of expertise without actually falling into dangerous speculation. It is a matter of using the digital archive to “equip a process of investigation”, to use the words of a sociologist who pioneered these practices [CHA 03]. Typically, being a Michelet specialist means knowing that he visited science popularizers, and it may have been noted that some of them knew other writers from the same Normand circle, but it will be necessary to visualize and compare the network of scholars visited by the writers, fanned out by specialty, to discover that this position as a scientific ambassador is very important for all the writers who dealt with science after 1850, undoubtedly more important than the relationships with eminent figures. Is this importance going to grow over the century, as the writer moves away from the social poles associated with the bourgeoisie, salons and academies? This is a hypothesis that can be tested, as the archives provide a point of reference to become detached without giving into speculation and to propose a spread of hypotheses without giving up the real one.

This type of approach explains why the digital exploitation of archives is particularly well adapted to the exploration of vast corpuses, particularly when we know little about them. This was the case for the Euterpe project, of which no member could initially say that he was a specialist in the subject, as it was a matter of unearthing texts that had been completely forgotten by literary history and whose existence was but a supposition. Over the course of the collective task, it became clear that the point of reference provided by the digital analysis also allowed them to distance the received ideas, and even more surely the categories derived from theories that cover the real more than they reveal it. The case was presented where two researchers worked in parallel on the characteristics of the authors of scientific poems, one using visual archive processing, the other by defining “actor categories” a priori, based on predicated strategies one could effectively find out in some outstanding poets biography (this outstanding character making them the obvious model after which their category had been crafted - a classical bias in category-making) in the biography of the real and often remarkable poets in the corpus (this remarkable character in all likelihood being for nothing in the global image from which these categories were created; the bias is classic). The “Ancien Regime” method opened all the inherent questions to a typology (from when to consider an individual to be similar to this type? What criteria are used to justify the aggregates that are produced to create these types?, etc.). To this was added an embarrassing question: what happens if the practice of scientific poetry cannot be reduced to the Bourdieusian category of the strategy and the distinction that massively founded the analysis in question? The digital approach, for its part, gave different results in large measures: it suggested that the practice of scientific poetry did not offer real specificity, the existence of scientific poems not actually creating scientific poets, such that the typization returned to ontologizing a phenomenon that was undoubtedly just a “practice”, at a given moment, in the range of an individual’s practices [LOU 12]. Yet the scientific subject was often just a particular case, a subsection of a wider literary movement that, for an individual, consisted of participating in a community built around an event (emotionally, politically, etc.) by means of poetic inscription. And that, be this event a sensational but punctual scientific experience or the collective awareness of transformations resulting from technological progress - while at other times, as a function of the current state of the world or the library, other poetic productions by the same authors covered national history or the political incidents that then gave reason for this participation. Another work in progress on the corpuses of poems sent to academic cooperations has shown that the forms and authors intersect considerably, whatever the subjects may be.

A “sociology of scientific poems”, even for sociological reasons, could additionally have difficulty making sense on the century level. By processing the droves of scientific poems through visualization, the appearance could instead be seen of configurations that were specifically socio-historic: for example, the emergence of the medical poem brought by doctors in search of renown around the 1860s, or the abandonment of the subject by professional litterateurs in favor of non-specialized intellectuals, professors or lawyers starting in 1870, all while science was becoming a national debate. Another benefit, then, of visual data processing is freeing phenomena from a lecture based on prominent individuals who cry out to us or seduce us (for reasons that are often very far removed from their connection to the genre) and lead to the ontologization of a practice. By offering intellectual recovery disconnected from individual cases, the graph allows for the emergence of interpretations without a theoretical takeover. The digital approach is then a precious asset against the demon of theory, which is forced to wait, like in medicine, for the corpus to say what it could say concerning the practitioner’s questions and investigations.

Yet it is not uncommon for classical literary analysis to be based on a mixture of theory and intuition, and researchers can have the feeling that the hermeneutical “proceduralization” of practices tied to digital processing cuts them out of that part appropriately considered creative. We also must not underestimate the strength of the protection reflex of individually produced data: to pool your reference column on Zotero is to allow others to benefit, even within a limited team, from your works, before the publication that marks your legitimate ownership and scientific recognition. The idea that each researcher will benefit from a wide opening up of research data, thereby saving human time and allowing us to focus on what makes our expertise real, i.e. personal training coupled with a singular vision, none of that is enough at this time to counterbalance these concerns. It is true that the rarity of these practices in literature condemns the early adopters to an unequal exchange at first, but it is indicative that in the Biolographes team, only the doctoral and postdoctoral students, regardless of whether they are under contract, have shared their data - they are nonetheless the ones with the most fragile institutional position, but also the ones who could benefit the most from an exchange and for whom collaboration is becoming a generational indicator. From that, a very sensitive resistance for insiders and the temptation to compartmentalize, if digital archives must exist, the project’s different moments and actors. Against the threat of the process and the outside appropriation of personal work, computer science will gladly be made the researcher’s servant, distrusted but kept at bay at the same time.

  • [1]
Found a mistake? Please highlight the word and press Shift + Enter  
< Prev   CONTENTS   Next >
Business & Finance
Computer Science
Language & Literature
Political science