Sara Papi

I am an AI Researcher at FBK (Fondazione Bruno Kessler), working on speech processing and multimodal LLMs within the MEETWEEN and DVPS Horizon European projects. I received my PhD cum laude in Information Engineering and Computer Science from the University of Trento in 2024, with a focus on simultaneous speech translation and subtitling. My research interests span multimodal and crosslingual instruction-following models, speech foundation models, and LLMs. My work has been recognized with awards, including the Best PhD Graduate 2024 Award in Information and Communication Technology from the University of Trento, an Outstanding Paper and SAC Award at ACL 2024, and a Social Impact Paper Award at EMNLP 2024. I actively contribute to the community as an organizer of the IWSLT Evaluation Campaign and as an Area Chair or reviewer for major conferences in speech and NLP, such as *ACL and Interspeech.

(I love elephants ♥️🐘)

Leave an anonymous feedback here! 😊

News

Jan 23, 2026	🗣️ Talk on Crosslingual Evaluation of Multimodal Instruction-Following Models at Cohere Labs
May 09, 2025	🏆 Best PhD Graduate 2024 Award in Information and Communication Technology from the University of Trento
Mar 28, 2025	🏆 Highly Commended EAMT Best PhD Thesis
Nov 28, 2024	🏆 Honorable Mention for the Best Italian PhD Thesis at AIxIA
Nov 14, 2024	🏆 Social Impact Paper Award at EMNLP 2024!
Aug 14, 2024	🏆 Outstanding Paper and other Achievement at ACL 2024!
Aug 07, 2024	I am presenting 5 papers at ACL 2024! 🎉
Apr 18, 2024	I successfully defended my PhD 🎊
Feb 20, 2024	“How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena” accepted at LREC-COLING 2024 🎊
Dec 16, 2023	My second Microsoft internship paper was accepted at ICASSP 2024 🎊
Nov 28, 2023	My first Microsoft internship paper was accepted at ASRU 2023 🎊

Selected Publications

ICLR
Instruction Following

MCIF: Multimodal crosslingual instruction-following benchmark from scientific talks

Sara Papi, Maike Züfle, Marco Gaido, and 5 more authors

In The Thirteenth International Conference on Learning Representations, 2026

HF🤗 arXiv PDF Code Poster Slides
TACL Speech Translation

How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System?

Sara Papi, Peter Polák, Dominik Macháček, and 1 more author

Transactions of the Association for Computational Linguistics, Apr 2025

DOI arXiv PDF Video Poster Slides
EMNLP
Dataset

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Marco Gaido^*, Sara Papi^*, Luisa Bentivogli, and 6 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

HF🤗 arXiv PDF Video Code Poster Slides
EMNLP
Human-Centered AI

What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study

Beatrice Savoldi, Sara Papi, Matteo Negri, and 2 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (Social Impact Paper Award) , Nov 2024

Awarded arXiv PDF Poster

EMNLP 2024 Social Impact Paper Award
ACL
Speech Translation

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

Sara Papi, Marco Gaido, Matteo Negri, and 1 more author

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

arXiv PDF Video Code Poster Slides
ACL
Speech Translation

Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

Marco Gaido, Sara Papi, Matteo Negri, and 1 more author

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Outstanding Paper and SAC Award) , Aug 2024

Awarded arXiv PDF Poster Slides

ACL 2024 Outstanding Paper and Senior Area Chair Award
ACL
Automatic Subtitling

SBAAM! Eliminating Transcript Dependency in Automatic Subtitling

Marco Gaido, Sara Papi, Matteo Negri, and 2 more authors

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

Abs

Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subtasks. In response to the acknowledged limitations associated with this reliance on transcripts, recent research has shifted towards transcription-free solutions for translation and segmentation, leaving the direct generation of timestamps as uncharted territory. To fill this gap, we introduce the first direct model capable of producing automatic subtitles, entirely eliminating any dependence on intermediate transcripts also for timestamp prediction. Experimental results, backed by manual evaluation, showcase our solution’s new state-of-the-art performance across multiple language pairs and diverse conditions.
ACL

When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP

Sara Papi^*, Marco Gaido^*, Andrea Pilzer, and 1 more author

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

Abs

Despite its crucial role in research experiments, code correctness is often presumed solely based on the perceived quality of results. This assumption, however, comes with the risk of erroneous outcomes and, in turn, potentially misleading findings. To mitigate this risk, we posit that the current focus on reproducibility should go hand in hand with the emphasis on software quality. We support our arguments with a case study in which we identify and fix three bugs in widely used implementations of the state-of-the-art Conformer architecture. Through experiments on speech recognition and translation in various languages, we demonstrate that the presence of bugs does not prevent the achievement of good and reproducible results, which however can lead to incorrect conclusions that potentially misguide future research. As countermeasures, we release pangoliNN, a library dedicated to testing neural models, and propose a Code-quality Checklist, with the goal of promoting coding best practices and improving software quality within the NLP community.
TACL Automatic Subtitling

Direct Speech Translation for Automatic Subtitling

Sara Papi, Marco Gaido, Alina Karakanta, and 3 more authors

Transactions of the Association for Computational Linguistics, Nov 2023

DOI