Multilinguality

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Basque Linguistic Diversity

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model …

arXiv

•

Ekhi Azurmendi

Joseba Fernandez de Landa

Jaione Bengoetxea

Maite Heredia

Julen Etxaniz

Mikel Zubillaga

Ander Soraluze

Aitor Soroa

• dic. 3, 2025 • 1 min de lectura

arXiv PDF Código fuente Datos Modelo

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Basque Multimodal

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource …

arXiv

•

Lukas Arana

Julen Etxaniz

Ander Salaberria

Gorka Azkune

• nov. 12, 2025 • 1 min de lectura

arXiv PDF Datos

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque Instruction Tuning

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to …

EMNLP 2025

•

Oscar Sainz

Naiara Perez

Julen Etxaniz

Joseba Fernandez de Landa

Itziar Aldabe

Iker García-Ferrero

Aimar Zabala

Ekhi Azurmendi

German Rigau

Eneko Agirre

Mikel Artetxe

Aitor Soroa

• nov. 4, 2025 • 1 min de lectura

URL PDF Código fuente Datos Modelo Sitio

Natural Language Processing Language Models Deep Learning Multilinguality Cognitive Modeling

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

We present BabyBabelLM, a multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. We curate developmentally …

EACL 2026

•

Jaap Jumelet

Abdellah Fourtassi

Akari Haga

Bastian Bunzeck

Bhargav Shandilya

Diana Galvan-Sosa

Faiz Ghifari Haznitrama

Francesca Padovani

Francois Meyer

Hai Hu

Julen Etxaniz

others

• oct. 11, 2025 • 1 min de lectura

arXiv PDF Sitio Código fuente Datos Modelo

Natural Language Processing Large Language Models Deep Learning Multilinguality Truthfulness Evaluation

Truth Knows No Language: Evaluating Truthfulness Beyond English

We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations …

ACL 2025

•

Blanca Calvo Figueras

Eneko Sagarzazu

Julen Etxaniz

Jeremy Barnes

Pablo Gamallo

Iria de-Dios-Flores

Rodrigo Agerri

• jul. 27, 2025 • 1 min de lectura

URL PDF Código fuente Datos Modelo

Natural Language Processing Large Language Models Deep Learning Multilinguality Dialects Norwegian

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot …

COLING 2025

•

Jaione Bengoetxea

Mikel Zubillaga

Ekhi Azurmendi

Maite Heredia

Julen Etxaniz

Markel Ferro

Jeremy Barnes

• dic. 13, 2024 • 1 min de lectura

PDF Código fuente arXiv

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque

Latxa Euskarazko Hizkuntza-Eredua

Artikulu honetan Latxa hizkuntza-ereduak (HE) aurkeztuko ditugu, egun euskararako garatu diren HE handienak. Latxa HEek 7.000 miloi parametrotik 70.000 milioira bitartean dituzte, …

EKAIA EHUko Zientzia eta Teknologia aldizkaria

•

Naiara Perez

Julen Etxaniz

Oscar Sainz

Itziar Aldabe

German Rigau

Eneko Agirre

Ahmed Salem

Aitor Ormazabal

Mikel Artetxe

Aitor Soroa

• sept. 24, 2024 • 1 min de lectura

PDF Código fuente Datos

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Culture Basque

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how …

NeurIPS Datasets and Benchmarks 2024

•

Julen Etxaniz

Gorka Azkune

Aitor Soroa

Oier Lopez de Lacalle

Mikel Artetxe

• jun. 11, 2024 • 1 min de lectura

arXiv PDF Código fuente Datos

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque

IKER-GAITU: research on language technology for Basque and other low-resource languages

The general objective of the IKER-GAITU project is to research on language technology to increase the presence of Basque in the digital environment. It will be carried out between …

PROJECTS & DEMOS SEPLN - CEDI 2024

•

Eneko Agirre

Itziar Aldabe

Xabier Arregi

Mikel Artetxe

Unai Atutxa

Ekhi Azurmendi

Iker De la Iglesia

Julen Etxaniz

Victor García-Romillo

Inma Hernaez-Rioja

others

• abr. 15, 2024 • 1 min de lectura

PDF

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque

XNLIeu: a dataset for cross-lingual NLI in Basque

XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this …

NAACL 2024

•

Maite Heredia

Julen Etxaniz

Muitze Zulaika

Xabier Saralegi

Jeremy Barnes

Aitor Soroa

• abr. 10, 2024 • 1 min de lectura

arXiv PDF Código fuente Datos

No results found

Multilinguality