Publications

Natural Language Processing Large Language Models Deep Learning Evaluation Commonsense Reasoning Italian

Challenging the Abilities of Large Language Models in Italian: a Community Initiative

The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of …

arXiv

•

Malvina Nissim

Danilo Croce

Viviana Patti

Pierpaolo Basile

Giuseppe Attanasio

Elio Musacchio

Matteo Rinaldi

Federico Borazio

Maria Francis

Jacopo Gili

others

• Dec 4, 2025 • 1 min read

arXiv PDF Code

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Basque Linguistic Diversity

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model …

arXiv

•

Ekhi Azurmendi

Joseba Fernandez de Landa

Jaione Bengoetxea

Maite Heredia

Julen Etxaniz

Mikel Zubillaga

Ander Soraluze

Aitor Soroa

• Dec 3, 2025 • 1 min read

arXiv PDF Code Dataset Model

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Basque Multimodal

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource …

arXiv

•

Lukas Arana

Julen Etxaniz

Ander Salaberria

Gorka Azkune

• Nov 12, 2025 • 1 min read

arXiv PDF Dataset

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque Instruction Tuning

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to …

EMNLP 2025

•

Oscar Sainz

Naiara Perez

Julen Etxaniz

Joseba Fernandez de Landa

Itziar Aldabe

Iker García-Ferrero

Aimar Zabala

Ekhi Azurmendi

German Rigau

Eneko Agirre

Mikel Artetxe

Aitor Soroa

• Nov 4, 2025 • 1 min read

URL PDF Code Dataset Model Site

Natural Language Processing Language Models Deep Learning Multilinguality Cognitive Modeling

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

We present BabyBabelLM, a multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. We curate developmentally …

EACL 2026

•

Jaap Jumelet

Abdellah Fourtassi

Akari Haga

Bastian Bunzeck

Bhargav Shandilya

Diana Galvan-Sosa

Faiz Ghifari Haznitrama

Francesca Padovani

Francois Meyer

Hai Hu

Julen Etxaniz

others

• Oct 11, 2025 • 1 min read

arXiv PDF Site Code Dataset Model

Natural Language Processing Large Language Models Deep Learning Multilinguality Truthfulness Evaluation

Truth Knows No Language: Evaluating Truthfulness Beyond English

We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations …

ACL 2025

•

Blanca Calvo Figueras

Eneko Sagarzazu

Julen Etxaniz

Jeremy Barnes

Pablo Gamallo

Iria de-Dios-Flores

Rodrigo Agerri

• Jul 27, 2025 • 1 min read

URL PDF Code Dataset Model

Natural Language Processing Large Language Models Deep Learning Multilinguality Dialects Norwegian

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot …

COLING 2025

•

Jaione Bengoetxea

Mikel Zubillaga

Ekhi Azurmendi

Maite Heredia

Julen Etxaniz

Markel Ferro

Jeremy Barnes

• Dec 13, 2024 • 1 min read

PDF Code arXiv

Natural Language Processing Large Language Models Deep Learning Evaluation Commonsense Reasoning Italian

GITA4CALAMITA - Evaluating the Physical Commonsense Understanding of Italian LLMs in a Multi-layered Approach: A CALAMITA Challenge

In the context of the CALAMITA Challenge, we investigate the physical commonsense reasoning capabilities of large language models (LLMs) and introduce a methodology to assess their …

CLiC-it 2024

•

Giulia Pensa

Ekhi Azurmendi

Julen Etxaniz

Begoña Altuna

Itziar Gonzalez-Dios

• Dec 6, 2024 • 1 min read

PDF Code Dataset

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque

Latxa Euskarazko Hizkuntza-Eredua

Artikulu honetan Latxa hizkuntza-ereduak (HE) aurkeztuko ditugu, egun euskararako garatu diren HE handienak. Latxa HEek 7.000 miloi parametrotik 70.000 milioira bitartean dituzte, …

EKAIA EHUko Zientzia eta Teknologia aldizkaria

•

Naiara Perez

Julen Etxaniz

Oscar Sainz

Itziar Aldabe

German Rigau

Eneko Agirre

Ahmed Salem

Aitor Ormazabal

Mikel Artetxe

Aitor Soroa

• Sep 24, 2024 • 1 min read

PDF Code Dataset

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Culture Basque

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how …

NeurIPS Datasets and Benchmarks 2024

•

Julen Etxaniz

Gorka Azkune

Aitor Soroa

Oier Lopez de Lacalle

Mikel Artetxe

• Jun 11, 2024 • 1 min read

PDF Code Dataset arXiv

No results found

Publications