×

Index Thomisticus Treebank

Index Thomisticus Treebank

Website of the project

Started by Roberto Busa SJ in 1949, the Index Thomisticus is considered as a groundbreaking project in computational linguistics. It is a corpus containing the opera omnia of Thomas Aquinas (118 texts) as well as 61 texts by other authors related to Thomas, for a total of approximately 11 million words, each morphologically tagged and lemmatized by hand.

Early in the 1970s Busa started to plan a project aimed both at the morphosyntactic disambiguation of the Index Thomisticus lemmatization and the syntactic annotation of its sentences. Nowadays, these tasks are performed by the ongoing ‘Index Thomisticus Treebank' project, which is part of a wider one, named ‘Lessico Tomistico Biculturale', whose target is to develop a lexicon from the Index Thomisticus texts.

The Index Thomisticus Treebank is a dependency-based syntactically annotated corpus. The annotation style is grounded on the guidelines developed by ÚFAL for the so-called ‘analytical layer' of the Prague Dependency Treebank of Czech language. Furthermore, specific guidelines for the syntactic annotation of Latin texts are shared with the Latin Dependency Treebank project developed by the Perseus Digital Library at Tufts University in Boston.

Presently, the size of the Index Thomisticus Treebank is approximately 450,000 nodes (more than 26,000 syntactically parsed sentences) excerpted from the Scriptum super Sententiis Magistri Petri Lombardi, the Summa contra Gentiles (entirely annoated: 4 books) and the Summa Theologiae.

The Index Thomisticus Treebank data and the Index Thomisticus Treebank valency lexicon can be browsed at the project website. The Index Thomisticus data (not treebanked) can be browsed on CD-ROM or from the Corpus Thomisticum website.

On 4 November 2010, father Busa donated to IBM his own copy of the 56 volumes of the Index Thomisticus. The donation was made to the IBM President Nicola Ciniero. Furthermore, father Busa has donated his own library to the Università Cattolica of Milan. See also this article of 'Avvenire' (4 November 2010).

On 9 August 2011, father Busa passed away. He will always remain in the hearts of the members of CIRCSE, of the scientific community and of his many friends.

 

scroll-top-icon