LT4HALA-2024

LREC-COLING 2024-Torino, Italy (and online)

DATE

25 May 2024

time

9:00

Dove:

Turin

Third Workshop on Language Technologies for Historical and Ancient LAnguages (#LT4HALA2024)

Website: https://circse.github.io/LT4HALA/
Place: co-located with LREC-COLING 2024, May 20-25, Torino, Italy
Date: Saturday, May 25th

Description

This one-day workshop seeks to bring together scholars, who are developing and/or are using Language Technologies (LTs) for historically attested languages, so to foster cross-fertilization between the Computational Linguistics community and the areas in the Humanities dealing with historical linguistic data, e.g. historians, philologists, linguists, archaeologists and literary scholars. Despite the current availability of large collections of digitized texts written in historical languages, such interdisciplinary collaboration is still hampered by the limited availability of annotated linguistic resources for most of the historical languages. Creating such resources is a challenge and an obligation for LTs, both to support historical linguistic research with the most updated technologies and to preserve those precious linguistic data that survived from past times.

Relevant topics for the workshop include, but are not limited to:

handling spelling variation,
detection and correction of OCR errors,
creation and annotation of linguistic resources,
deciphering,
morphological/syntactic/semantic analysis of textual data,
adaptation of tools to address diachronic/diatopic/diastratic variation in texts,
teaching ancient languages with LTs,
NLP-driven theoretical studies in historical linguistics,
NLP-driven analysis of literary ancient texts,
evaluation of LTs designed for historical and ancient languages,
Large Language Models for the automatic analysis of ancient texts.

The workshop will also be the venue of the:

third edition of EvaLatin, an evaluation campaign entirely devoted to the evaluation of NLP tools for Latin. The third edition of EvaLatin will focus on two tasks (i.e. dependency parsing and emotion polarity detection). Dependency parsing will be based on the Universal Dependencies framework. No specific training data will be released but participants will be free to make use of any (kind of) resource they consider useful for the task, including the Latin treebanks already available in the UD collection. In this regard, one of the challenges of this task will be to understand which treebank (or combination of treebanks) is the most suitable to deal with new test data. Test data will be both prose and poetic texts from different time periods. Also for the emotion polarity detection task, no training data will be released but the organizers will provide an annotation sample, a manually created polarity lexicon and annotation guidelines. Also in this task, participants will be free to pursue the approach they prefer, including unsupervised and/or cross-language ones (which promise to be the most efficient, given the lack of training data for Latin for this task). Test data will be poetic texts from different time periods.
third edition of EvaHan, the evaluation campaign for the evaluation of NLP tools for Ancient Chinese. EvaHan 2024 will focus on two tasks: Ancient Chinese sentence segmentation and sentence punctuation.

LT4HALA Organizers

Marco Passarotti, Università Cattolica del Sacro Cuore, Milan, Italy
Rachele Sprugnoli, Università Cattolica del Sacro Cuore, Milan, Italy

EvaLatin Organizers

Rachele Sprugnoli, Università Cattolica del Sacro Cuore, Milan, Italy
Federica Iurescia, Università Cattolica del Sacro Cuore, Milan, Italy
Marco Passarotti, Università Cattolica del Sacro Cuore, Milan, Italy

EvaHan Organizers

Li Bin, School of Chinese Language and Literature, Nanjing Normal University, P.R. China
Bolin Chang, Nanjing Normal University, P.R. China
Minxuan Feng, Nanjing Normal University, P.R. China
Chao Xu, Nanjing Normal University, P.R. China
Dongbo Wang, Nanjing Agricultural University, P.R. China

Programme Committee

Adam Anderson, FactGrid Cuneiform Project, USA
Yannis Assael, Google DeepMind
Monica Berti, University of Leipzig, Germany
Luca Brigada Villa, Università di Bergamo, Italy
Flavio Massimiliano Cecchini, Università Cattolica del Sacro Cuore di Milano, Italy
Margherita Fantoli, University of Leuven, Belgium
Federica Gamba, Charles University, Czech Republic
Shai Gordin, Ariel University, Israel
Federica Iurescia, Università Cattolica del Sacro Cuore di Milano, Italy
Bin Li, School of Chinese Language and Literature at Nanjing Normal University, P.R. China
Eleonora Litta, Università Cattolica del Sacro Cuore di Milano, Italy
Yudong Liu, Western Washington University
Barbara McGillivray, Turing Institute, UK
Beáta Megyesi, Uppsala University, Sweden
Chiara Palladino, Furman University, USA
John Pavlopoulos, Athens University of Economics and Business, Greece
Eva Pettersson, Uppsala University, Sweden
Sophie Prévost, Laboratoire Lattice, France
Thea Sommerschield, Ca’ Foscari University of Venice, Italy
James Tauber, Eldarion, USA
Toon Van Hal, Katholieke Universiteit Leuven, Belgium
Tariq Yousef, University of Southern Denmark, Denmark