English, Guest Posts

Digital Humanities and Rabbinic Literature

The Talmud Blog is happy to be hosting a series on the interface of Digital Humanities and the study of Rabbinic Literature. Our first post comes from Prof. Michael Satlow, of Brown University. 

The other week I attended a workshop called Classical Philology Goes Digital Workshop in Potsdam, Germany. The major goal of the workshop, which was also tied to the Humboldt Chair of Digital Humanities, was to further the work of creating and analyzing open texts of the “classics”, broadly construed. We have been thinking about adding natural language processing (including morphological and syntactic tagging – or, as I learned at the workshop, more accurately “annotation”) to the Inscriptions of Israel/Palestine project. While we learned much and are better positioned to add this functionality, I was most struck by how far the world of “digital classical philology,” focused mainly on texts, has progressed, and it got me thinking about the state of our own field.

A current running underneath the workshop was the uneasy sense that classical philology, as traditionally understood and practiced, is in sharp decline. There is increasingly little interest on the part of students, administrators, and funding agencies to support, for example, the creation of textual editions of Greek and Latin texts. The gambit at the heart of this workshop is that going digital provides a new and more exciting approach to classical philology. Instead of focusing on individual texts, philology becomes a collaborative exercise in “big data” and “distant reading.” Electronic editions of each text are prepared with this in mind and then enter this wider corpus to which a variety of digital tools can be applied. At the workshop several of the larger initiatives, such as PerseusOpen Philology, and the Digital Latin Library, were discussed. All of these projects, unlike the Thesaurus Linguae Graecae for example, are open access. Open access is a critical part of this vision as its value is not simply its accessibility but its availability to digital analysis, revision, and reuse.

Most of the presentations dealt with practical issues dealing especially with standards and annotations: What does one do to a text (other than give free access to it) to maximize its scholarly utility? How does one annotate not only morphology, syntax, and named entities (e.g., proper names and places) but also actions and events? Canonical Text Services (CTS), an architecture for precise citation of digital texts, turns out to be particularly important because it facilitates the linking of lines of text in one manuscript to another (thus allowing for the automated production of synoptic editions), to parts of images, and to various translations. Tools like iAlign (being used in Leipzig) were particularly interesting in this respect. Other presentations focused on creating treebanks, that is, something that looks like the sentence diagrams I had to do in middle school. These can then be analyzed and compared across texts for rhetorical similarities and differences.

Another area of focus in the classics is interextuality and the tools that can reveal citing or reuse of one text by another. One important site that shows its utility it Tesserae, which supports this kind of analysis across several Latin texts. TRACER is also a powerful tool for Latin. While this digital approach to classics and distant reading still has its strong critics (see, e.g., here), there is little question that these and tools like them will yield import, perhaps transformational, scholarship.

And this finally brings me to rabbinic literature. Where do we stand in relation to the application of the digital to the classics, and where are the opportunities? In some surprising respects, we are very much on or ahead of the curve. Large swathes of texts already have been digitized and some (e.g., Mechon Mamre and Sefaria) are committed to an open-access policy. The Bar-Ilan Responsa project contains a vast number of texts that are also tagged for morphology, allowing, for example, searches by lemma. Two sites in particular, the Lieberman Institute and the Friedberg Jewish Manuscript Society contain digitized manuscripts, transcriptions, and some kind of CTS architecture that links images and different transcriptions, although neither allows for fully automated open access. One model is the Digital Mishnah which is open-access and has many of the features noted above.

The question that occurred to me in Potsdam is how we deploy and utilize these resources to move us into the age of “big data” and to make possible the kinds of larger-scale, cross-corpus analyses that our colleagues in classics are beginning to do. We have only just begun to think about visualizing the links between documents (as in this example from Sefaria) and, in general, applying the approach of “distant reading” to the rabbinic corpus (see, for example, the dissertation of Itay Marienberg-Milikowsky). What are the opportunities and how do we get there?

To be transparent, I confess that I have dreams. I am intrigued by the idea of creating digital editions of rabbinic texts. I would like to see links between images, transcriptions, and different translations. I would like to be able to map places and events found in this literature and to create a social network analysis of the rabbis. Using treebanks as a new approach to form analysis could be exciting. Further down the road, perhaps literary and formal structures of talmudic sugyot could be created at the push of a button. What kinds of questions would these analyses allow us to answer? What kinds of new questions would we ask?

But open-access digital editions are the first step. These editions would ideally use a CTS architecture and include multiple manuscript transcriptions and images; morphological, lexical, and syntactic annotation; links of words to such places as Ma’agarim and the Comprehensive Aramaic Lexicon; and annotations of named entities. Given what has already been done, should the various resource owners desire to cooperate, such editions might be easily and quickly produced. Of course, any efforts to do this would have to be accompanied by a viable and sustainable financial model.

In the United States and Israel (and I suspect Europe as well), the traditional practice of “rabbinics” in secular universities, like classical philology, is in a precarious position. It is increasingly important for the survival of the field to make our texts relevant to larger academic concerns. Big data and distant reading are not the only possible approaches to making rabbinic literature more relevant, but they are receiving increased attention (and funding) and offer a largely unexplored set of new research possibilities.

This is a curve we can get ahead of. Any takers?

[Cross-posted on Prof. Satlow’s blog, here]

 

 

Standard

3 thoughts on “Digital Humanities and Rabbinic Literature

  1. “These editions would … include multiple manuscript transcriptions and images; morphological, lexical, and syntactic annotation;” these are part of the developments being implemented in the Lieberman Institute Database

  2. Hayim Lapin says:

    This reply is on behalf of myself, Daniel Stoekl ben Ezra, and Yael Netzer. Apologies for the length of the reply!
    In reply to this timely and important post by Michael Satlow, we want to draw attention to our paper (in preparation) on a canonical text service to be presented at the World Congress for Jewish Studies this summer. This will outline a “Canonical Text Service” for Bible and classical rabbinic literature and demonstrate a preliminary implementation for a selected subset of this literature.
    An open data canonical text service provides two functions. First, it defines a system of “Uniform Resource Identifiers” (URIs) that identify texts and any parts of texts following a human-readable and consistent pattern. For biblical and rabbinic literature this needs to allow us to refer to a text in general (Genesis, Mishnah Kelim, the Talmud Bavli), but also to any given edition/manuscript/version (Genesis according to the Leningrad Codex; Kelim according to the Naples first edition, the Bavli according to the Munich MS). In other words, the scheme must coordinate the standardized or canonical reference system with the folio, column, line, etc., of a particular edition or manuscript. In addition, our scheme can specify texts down to the level of an individual character.
    The second function is to allow users (humans or machines) to request the texts dynamically. A canonical text service accepts the URI as a URL (e.g., in a browser window or in a request from one computer application to another) and returns the requested text. This means that external projects (say commentary, or linguistic annotation, or geographical or prosopographical data) can use the URIs as citations or as part of an architecture that can request the text on demand.
    To the extent that we embrace linked open data as a standard in our work, these URIs embed our texts in an open ended network of links: of name, place, and other “realia” identifiers, of grammatical and lexicographical analysis, and of scholarly work. We also make our data inter-operable with other projects that use similar standards. Thus, for instance, if we adopt the pattern of citations developed initially by the Homer MultiText Project and adopted by the Perseus project, applications built on Perseus (a much better established and fuller database than most classical Hebrew or Aramaic databases) should work with our own.

Leave a comment