🔬

Julen Etxaniz

I research on

PhD Student in Language Analysis and Processing at Hitz Center IXA Group EHU. Working on Improving Language Models for Low-resource Languages. Graduate in Informatics Engineering with speciality in Software Engineering. Master in Language Analysis and Processing.

Publications Projects

Experience

PhD Student

HiTZ Center IXA Group (EHU)

January 2023 – Present

Education

PhD in Language Analysis and Processing

University of the Basque Country (EHU)

January 2023 – Present

Master in Language Analysis and Processing

University of the Basque Country (EHU)

September 2021 – June 2022

Degree in Computer Engineering

University of the Basque Country (EHU)

September 2017 – June 2021

Awards

Best Resource Paper Award

ACL ∙ August 2024

Curso de Machine Learning Aplicado con Python

Platzi ∙ October 2020

Curso de Redes Neuronales en Keras y Scikit-Learn

Platzi ∙ October 2020

Curso Práctico de Regresión Lineal con Python

Platzi ∙ October 2020

Curso de Desarrollo de Apps Móviles

Universidad Complutense de Madrid ∙ January 2020

Certificate in Advanced English

Cambridge Assessment English ∙ February 2017

Euskararen Gaitasun Agiria

Eusko Jaurlaritza ∙ June 2016

Skills

Programming Languages

Python

R

Java

JavaScript

PHP

C

Web Development

Bootstrap

Tailwind CSS

Hugo

Alpine.js

jQuery

Netlify

Machine Learning

Scikit-Learn

TensorFlow

PyTorch

Keras

Kaggle

Hugging Face

Data

MySQL

SQLite

PostgreSQL

XML

JSON

YAML

Editors & Source Control

VS Code

LaTeX

Jupyter

Git

GitHub

GitHub Actions

Other Tools

Matplotlib

Numpy

Pandas

Docker

PyPI

PNPM

Featured Publications

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque Instruction Tuning

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to …

EMNLP 2025

•

Oscar Sainz

,

Naiara Perez

,

Julen Etxaniz

,

Joseba Fernandez de Landa

,

Itziar Aldabe

,

Iker García-Ferrero

,

Aimar Zabala

,

Ekhi Azurmendi

,

German Rigau

,

Eneko Agirre

,

Mikel Artetxe

,

Aitor Soroa

• Nov 4, 2025 • 1 min read

URL PDF Code Dataset Model Site

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Culture Basque

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how …

NeurIPS Datasets and Benchmarks 2024

•

Julen Etxaniz

,

Gorka Azkune

,

Aitor Soroa

,

Oier Lopez de Lacalle

,

Mikel Artetxe

• Jun 11, 2024 • 1 min read

PDF Code Dataset arXiv

Natural Language Processing Large Language Models Deep Learning Multilinguality Basque

Latxa: An Open Language Model and Evaluation Suite for Basque

We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque …

ACL 2024

•

Julen Etxaniz

,

Oscar Sainz

,

Naiara Perez

,

Itziar Aldabe

,

German Rigau

,

Eneko Agirre

,

Aitor Ormazabal

,

Mikel Artetxe

,

Aitor Soroa

• Mar 29, 2024 • 1 min read

PDF Code Dataset arXiv

Natural Language Processing Large Language Models Deep Learning Multilinguality

Do Multilingual Language Models Think Better in English?

Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating the input into English using an external …

NAACL 2024

•

Julen Etxaniz

,

Gorka Azkune

,

Aitor Soroa

,

Oier Lopez de Lacalle

,

Mikel Artetxe

• Aug 2, 2023 • 1 min read

PDF Code Dataset arXiv

Recent Publications

Natural Language Processing Large Language Models Deep Learning Evaluation Commonsense Reasoning Italian

Challenging the Abilities of Large Language Models in Italian: a Community Initiative

The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of …

arXiv

•

Malvina Nissim

,

Danilo Croce

,

Viviana Patti

,

Pierpaolo Basile

,

Giuseppe Attanasio

,

Elio Musacchio

,

Matteo Rinaldi

,

Federico Borazio

,

Maria Francis

,

Jacopo Gili

,

others

• Dec 4, 2025 • 1 min read

arXiv PDF Code

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Basque Linguistic Diversity

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model …

arXiv

•

Ekhi Azurmendi

,

Joseba Fernandez de Landa

,

Jaione Bengoetxea

,

Maite Heredia

,

Julen Etxaniz

,

Mikel Zubillaga

,

Ander Soraluze

,

Aitor Soroa

• Dec 3, 2025 • 1 min read

arXiv PDF Code Dataset Model

Natural Language Processing Large Language Models Deep Learning Evaluation Multilinguality Basque Multimodal

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource …

arXiv

•

Lukas Arana

,

Julen Etxaniz

,

Ander Salaberria

,

Gorka Azkune

• Nov 12, 2025 • 1 min read

arXiv PDF Dataset

Natural Language Processing Language Models Deep Learning Multilinguality Cognitive Modeling

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

We present BabyBabelLM, a multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. We curate developmentally …

EACL 2026

•

Jaap Jumelet

,

Abdellah Fourtassi

,

Akari Haga

,

Bastian Bunzeck

,

Bhargav Shandilya

,

Diana Galvan-Sosa

,

Faiz Ghifari Haznitrama

,

Francesca Padovani

,

Francois Meyer

,

Hai Hu

,

Julen Etxaniz

,

others

• Oct 11, 2025 • 1 min read

arXiv PDF Site Code Dataset Model

Natural Language Processing Large Language Models Deep Learning Multilinguality Truthfulness Evaluation

Truth Knows No Language: Evaluating Truthfulness Beyond English

We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations …

ACL 2025

•

Blanca Calvo Figueras

,

Eneko Sagarzazu

,

Julen Etxaniz

,

Jeremy Barnes

,

Pablo Gamallo

,

Iria de-Dios-Flores

,

Rodrigo Agerri

• Jul 27, 2025 • 1 min read

URL PDF Code Dataset Model

Natural Language Processing Large Language Models Deep Learning Multilinguality Dialects Norwegian

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot …

COLING 2025

•

Jaione Bengoetxea

,

Mikel Zubillaga

,

Ekhi Azurmendi

,

Maite Heredia

,

Julen Etxaniz

,