Collective Intelligence and Digital Archives: Towards Knowledge Ecosystems

What archives are we speaking of? Definition, issues and collective intelligence methods

Database archives, evolution of a concept and its functions

At the heart of historical practices, archives have long been defined as traces, set apart from collections created in a joint way and to serve an identified goal. As unconscious by-products of an activity, archives consequently required of the researcher the patient task of appropriating its constitution mechanisms, for these mechanisms conditioned the value of truth that could suitably be attached to each type of data.

With the appearance online of vast corpuses rarely structured and often aggregated of data, the nature of the archive is changing, as is its methods of use. Here, we could duplicate the well-known bibliographical couple of “primary” and “secondary” sources: similarly for archives, there is a “primary” archive, produced by the process or institution studied, and the secondary archive, which would be the product of exploiting the first while studying this process or institution. Thus, the notion of archives henceforth covers not only data produced by the organizations during their function (for example, editing house publication data, etc.), but also vast corpuses of data assembled in the scope of a research project that, without creating proper collections, are already the result of a selection process and have a structure of their own. For all that, these collections that we will speak of “become archive”, in the sense that they are intended to be deposited, afterward, in the maelstrom of accessible databases: they will then become archives for others, in the sense of a by-product, this time from previous research activity.

Will these second-order “archives”, which aggregate data in their various degrees of maturity, remain subject to the same usage norms as others? For the historian, the researcher’s first gesture when faced with an archive is, in the words of Michel de Certeau, to “redistribute space”, that is, to set apart, reassemble and reorganize data that had been produced by others in another order. This restructuring of the archive remains necessary to the extent that our subjects in social sciences are only indirectly and partially susceptible to a quantitative approach: the categories are not self-evident; they are almost always constructed, and as such, arguable. A clear example would be that of genre approaches: the tenet formulated by the ANR Euterpe team (2007-2011) to study the scientific poems published between 1792 and

1906 should assume a stabilized genre that in reality evolved over the course of the century between didactic poetry with scientific content, poems inspired by technology and the philosophy of science. It was thus necessary to compile a list of generic criteria assuring the coherence of the corpus, criteria that will necessarily bias later reutilizations by projects that do not closely handle the question of genre, but are solicitous of popularization, for example, or of relationships between science and the arts.

The first singularity of digital exploitation of archives in social sciences will therefore often be the very constructed character of its subjects, which curtails their fungibility. They do, however, have the advantage, vis-a-vis first-level archives, of explicitly demonstrating their categories, which facilitates the understanding of their structures and allows for the elaboration of reutilization strategies.

This wide redefinition of the notion of archives is not our doing: it follows a tendency particular to digital practices, where archiving, deposit and registration converge, to the point of becoming synonymous by the grace of Windows menus, and where the notion of information consequently seems to fuse with the prestigious “archive”. If this extension of the term under the pressure of digital uses can be lamented because it blunts its precision and thus its efficacy, we must admit that the power (“arkheion”) often identified with the archive lingers there, producing the same pragmatic effects. It reminds us of Derrida evoking the etymology of archives, as much a start as a commandment, in his reflection on the dark side of archives: because archives are the sources of the historical account: “No one ever renounces - and this is the unconscious itself - the appropriation of power over the document, over its detention, its retention or its interpretation. But who ultimately has the authority over the institution of the archive?” [DER 95, pp. 15-16]. The question is singularly pertinent in France, where the National Archives are associated with the reorganization of the Nation, both lending to the revolutionary break in which they institute a new order, and to the “arkhe”, a real box in which the repressed of this young nation are locked up (we can think of the famous “Armoire de fer”, an iron cabinet in Louis XVI’s former apartments, the true heart of the national archives where the first laws of the Constituent Assembly and essential items from the Ancien Regime, notably those from Joan of Arc’s trial and Louis XVI, are preserved).

There is no better way to present the symbolic impact of archives and the articulation between a hidden past, repressed in Derridean analysis, and a present power that the mastery of this past creates. What remains of this power in “digital archives”? For Dietmar Schenk, archivist historian and author of an archive theory who recently spread to the digital world, secret and power continue to characterize the fantasy of the digital archive [SCH 14b, SCH 13]: “in the systems of digital data processing, power, technology, and organization are very strongly interconnected factors; the quantities of information thusly managed often surpass understanding - archives can become anxiogenic, as their contents, from a strictly quantitative point of view, exceed the limits of the imaginable and evade our sight; it is in this way that the old type of secret archives as they existed in the princely states at the start of our modernity has become the archetype of an imaginary approach of the most modern information storage companies. Picturing the archive in terms of ‘power’ is in tune with the experience that many people have with new technological media with which they are not yet familiarized [...]” [SCH 14a, p. 39].

It is precisely this anxiogenic character that seems to reappear in the humanist uses of digital archives, despite their open, democratized character, which would seem to distance the very idea of control and power. In fact, researchers fear not the archive itself, but that there is no possible access to these archives except through intercession of those who can make it speak, in such a way that the interposition of a sort of computational “black box” poses numerous problems, both technical and epistemological, that we will try to approach through a practical case, that of our research within a multidisciplinary team, the ANR/DFG “Biolographes” team. Before dealing with special cases, however, it is only right to complete this exploration of the notion of archives with an overview of the methods and tools available to researchers in the humanities to exploit these mysterious archives.

