Challenges in Working with the Sources
Costa’s primary means of organizing the massive amount of data needed for his dissertation analysis consisted of several large alphabetized Word documents that he created containing annotated data from all the different manuscripts. These Word documents, exceeding 1,300 pages, served as the organized data source for Costa’s work for more than 25 years. Also at this time linguistic databases were being created, most notably the Summer Institute of Linguistics’ (SIL) Shoebox program.14 An early Shoebox database for myaamiaataweenki was constructed during the late 1980s by Myaamia Center researchers, but this database was later abandoned after SIL discontinued support for the Macintosh platform. To store, translate, and analyze the extensive corpus of documentation, it was obvious that a much more capable database tool was needed.
The unique challenges of working with the Miami-Illinois corpus have directly dictated the form that any database for storing the data must take. Virtually all preexisting dictionary database programs are designed for the cataloging of phonemic data taken directly from speakers. Examples are LingSync, WeSay, Lexique Pro, Language Explorer, Online Linguistic Database, and Miromaa. The needs of a database for Miami-Illinois are far different. In a Miami-Illinois database, one cannot simply include glossed phonemic data with no commentary, since almost none of the primary data in the language is in phonemic form. In support of the phonemicized data, one must also include the original primary data and glosses, precisely as they appear in the sources, as well as whatever additional data is deemed necessary to further support the phone- micizations and glosses. For example, it is not enough to give the phonemically spelled Miami-Illinois word for “no, not,” moobci-, it is also necessary to give the original forms, such as the one given in the early-18th century French Jesuit sources, (m8tchi) , as well as its translations given there, “seulement” (“only”) and “meme” (“even”). Additionally, one needs to include later transcriptions of this word such as Gatschet’s (mu’htchi) and Truman Michelson’s (m6‘tci) , which both support the phonemic reconstruction moobci, as well as the fact that in all sources from the late- 1700s onwards this word is translated as simply “not” or “no.” In further support of the phonemicizations, it is often helpful to include cognates from other Algonquian languages, so one must also have a dedicated field to include related words, such as Meskwaki mo-hci and Shawnee mobci “even.” Essentially the entries for the individual words must include not only all the data needed to interpret the words for language instruction, but they must also include all the primary evidence gathered to support how the translations, grammatical analysis, and corrected spellings were arrived at. For example, in many verbs, the only phonological feature distinguishing the first person singular and the second person singular is vowel length: for example, compare mee- naani “I drink” versus meenani “you drink.”15
Toward a Technical Solution
Processing the older Jesuit-era materials proved challenging for many years. In 1999, myaamiaataweenki revivalists, who were interested in gaining access to these early sources, initiated the Illinois Project.16 The primary goal of this early project was simply to develop a process for systematically transcribing and translating the Jesuit era manuscripts in order to gain access to new language materials for reclamation efforts. The Illinois Project was initially overseen through the joint effort of members of the language committees of the Miami Tribe of Oklahoma and the Miami Nation of Indians of the State of Indiana, Inc., under the guidance of a ten-year compact agreement signed in 1997. This agreement was intended to create a collaborative environment for each entity to work together and provide volunteer resources for the project and other language reclamation efforts. The agreement reached its end in 2007 and was not renewed. From this point in 2007, the Myaamia Project (now the Myaamia Center) assumed the responsibility of moving the Illinois Project’s goals forward. During this time period we relied heavily on volunteer transcriptionists and project organizers as there were no funds available to support full or part-time research staff. Many hours were dedicated by volunteers over a span of several years who organized this early project and created a great deal of transcription materials that would be used in a later phase of this work. Transcription work to properly prepare and process the vast amount of early data was especially challenging. For these reasons, it became very difficult to move the project forward and so it eventually fell dormant for several years. Reflecting back on this early struggle, the challenges became clear: How do we organize, store, and retrieve massive amounts of data as needed? What would an ideal database system look like? And what functionality would be available to us that would make linguistic analysis more efficient and easier to perform?
In a second attempt to address these problems, the Illinois Project was reinvigorated in 2012 when the Myaamia Center received an award from the National Endowment for the Humanities (NEH).1 The NEH award, along with technological advances, allowed the Illinois Project (now referred to as the Ilaatawaakani Project) to take a significant leap forward by allowing us to re-examine ways of accessing Jesuit-era source materials, as well as to reassess technological advances in order to determine what type of digital archive and digital research tools were possible. To test the waters, we began working with one of the three primary Jesuit manuscripts.18 This document was selected as the first to be transcribed due to its numerous example sentences, its well-organized format, and the fact that it had never been edited before. The translated, annotated and analyzed redaction of the LeBoullenger manuscript, with its vast amount of data, would constitute an invaluable source of readily accessible data and properly test our ideas for archival access and development. During the three years of the NEH supported grant, the Ilaatawaakani Project would create the first ever Miami-Illinois Digital Archive (MIDA).19
Before looking any further at developmental concepts, it should be noted that MIDA was never intended to be a language learning tool. The purpose for creating MIDA was to eliminate, to a large degree, the cumbersome need to work directly with original source materials as well as to create what current linguistic databases were not providing. The MIDA has brought order to the large corpus of language data and allows us to filter out specific kinds of information. As an example, it is not uncommon for useful and important language data to be embedded within the manuscripts in such a way that a typical visual search of the physical pages (in whatever order the manuscript was written) does not easily allow a user to find the entry. For example, the interesting word alaamatayi (spelled in the original (aramataye) ) is glossed by LeBoullenger “avant que de naitre dans le ventre de sa mere,” which translated literally into English is “before being born in his/her mother’s womb.” Upon further analysis, this word turns out to be an adverb, basically meaning “in the womb, in utero.” However, this word is not listed under any word for “womb,” nor in any kind of list of terms having to do with childbirth or even body part terms, but is instead hidden under LeBoullenger’s keyword “avant,” which in English means “before.” This is a fairly typical example of how hundreds of interesting vocabulary items in this manuscript often lurk in places where they cannot be “looked up” in any way until the manuscript is available in a searchable database.
To initiate development of the NEH-supported Ilaatawaakani Project, we pulled together a team of linguists, tribal researchers, computer programmers, and communication specialists to design and build a new software application that would address our identifiable needs. Initial design requirements for MIDA included the following stipulations:
- • The database must be designed to function as simply as possible and to not do more than what was necessary to meet the Myaamia Center’s research needs.
- • It must have a robust search function. Finding something within and among several manuscripts would hinge on a well-designed search application.
- • It must be online and accessible to anyone interested for research purposes.
- • Some language content created through MIDA, such as stem and morpheme lists, would be shared with the online Myaamia Dictionary, a separate online resource serving as a community language learning tool.20
Over the initial three years of developing the Ilaatawaakani Project, our understanding has evolved tremendously. This has led to the development of MIDA, which has already become one of the most significant research tools we have developed to date. Future versions of MIDA will include all known linguistic source materials. In the following sections we describe in greater detail project organization including the digitization of the manuscript, the process for building the archive (including search and administrative tools), and initial research uses in different disciplines.