Julen Etxaniz
🔬

Julen Etxaniz

I research on

PhD Student in Language Analysis and Processing at Hitz Center IXA Group EHU. Working on Improving Language Models for Low-resource Languages. Graduate in Informatics Engineering with speciality in Software Engineering. Master in Language Analysis and Processing.

Experience

PhD Student

HiTZ Center IXA Group (EHU)

Education

PhD in Language Analysis and Processing

University of the Basque Country (EHU)

Master in Language Analysis and Processing

University of the Basque Country (EHU)

Degree in Computer Engineering

University of the Basque Country (EHU)

Awards
Best Resource Paper Award
ACL ∙ August 2024
Curso de Desarrollo de Apps Móviles
Universidad Complutense de Madrid ∙ January 2020
Certificate in Advanced English
Cambridge Assessment English ∙ February 2017

Skills

Programming Languages

Python
R
Java
JavaScript
PHP
C

Web Development

Bootstrap
Tailwind CSS
Hugo
Alpine.js
jQuery
Netlify

Machine Learning

Scikit-Learn
TensorFlow
PyTorch
Keras
Kaggle
Hugging Face

Data

MySQL
SQLite
PostgreSQL
XML
JSON
YAML

Editors & Source Control

VS Code
LaTeX
Jupyter
Git
GitHub
GitHub Actions

Other Tools

Matplotlib
Numpy
Pandas
Docker
PyPI
PNPM
Featured Publications
Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque featured image

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Instructing language models with user intent requires large instruction datasets, which are only available for a limited set of languages. In this paper, we explore alternatives to …

EMNLP 2025
Oscar Sainz
,
Naiara Perez
,
Julen Etxaniz
,
Joseba Fernandez de Landa
,
Itziar Aldabe
,
Iker García-Ferrero
,
Aimar Zabala
,
Ekhi Azurmendi
,
German Rigau
,
Eneko Agirre
,
Mikel Artetxe
,
Aitor Soroa
BertaQA: How Much Do Language Models Know About Local Culture? featured image

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how …

NeurIPS Datasets and Benchmarks 2024
Julen Etxaniz
,
Gorka Azkune
,
Aitor Soroa
,
Oier Lopez de Lacalle
,
Mikel Artetxe
Latxa: An Open Language Model and Evaluation Suite for Basque featured image

Latxa: An Open Language Model and Evaluation Suite for Basque

We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque …

ACL 2024
Julen Etxaniz
,
Oscar Sainz
,
Naiara Perez
,
Itziar Aldabe
,
German Rigau
,
Eneko Agirre
,
Aitor Ormazabal
,
Mikel Artetxe
,
Aitor Soroa
Do Multilingual Language Models Think Better in English? featured image

Do Multilingual Language Models Think Better in English?

Translate-test is a popular technique to improve the performance of multilingual language models. This approach works by translating the input into English using an external …

NAACL 2024
Julen Etxaniz
,
Gorka Azkune
,
Aitor Soroa
,
Oier Lopez de Lacalle
,
Mikel Artetxe
Recent Publications
Challenging the Abilities of Large Language Models in Italian: a Community Initiative featured image

Challenging the Abilities of Large Language Models in Italian: a Community Initiative

The rapid progress of Large Language Models (LLMs) has transformed natural language processing and broadened its impact across research and society. Yet, systematic evaluation of …

arXiv
Malvina Nissim
,
Danilo Croce
,
Viviana Patti
,
Pierpaolo Basile
,
Giuseppe Attanasio
,
Elio Musacchio
,
Matteo Rinaldi
,
Federico Borazio
,
Maria Francis
,
Jacopo Gili
,
others
BERnaT: Basque Encoders for Representing Natural Textual Diversity featured image

BERnaT: Basque Encoders for Representing Natural Textual Diversity

Language models depend on massive text corpora that are often filtered for quality, a process that can unintentionally exclude non-standard linguistic varieties, reduce model …

arXiv
Ekhi Azurmendi
,
Joseba Fernandez de Landa
,
Jaione Bengoetxea
,
Maite Heredia
,
Julen Etxaniz
,
Mikel Zubillaga
,
Ander Soraluze
,
Aitor Soroa
Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque featured image

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource …

arXiv
Lukas Arana
,
Julen Etxaniz
,
Ander Salaberria
,
Gorka Azkune
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data featured image

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

We present BabyBabelLM, a multilingual collection of datasets modeling the language a person observes from birth until they acquire a native language. We curate developmentally …

EACL 2026
Jaap Jumelet
,
Abdellah Fourtassi
,
Akari Haga
,
Bastian Bunzeck
,
Bhargav Shandilya
,
Diana Galvan-Sosa
,
Faiz Ghifari Haznitrama
,
Francesca Padovani
,
Francois Meyer
,
Hai Hu
,
Julen Etxaniz
,
others
Truth Knows No Language: Evaluating Truthfulness Beyond English featured image

Truth Knows No Language: Evaluating Truthfulness Beyond English

We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations …

ACL 2025
Blanca Calvo Figueras
,
Eneko Sagarzazu
,
Julen Etxaniz
,
Jeremy Barnes
,
Pablo Gamallo
,
Iria de-Dios-Flores
,
Rodrigo Agerri
HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation featured image

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot …

COLING 2025
Jaione Bengoetxea
,
Mikel Zubillaga
,
Ekhi Azurmendi
,
Maite Heredia
,
Julen Etxaniz
,
Markel Ferro
,
Jeremy Barnes

Projects

Shape Classification

Shape Classification

The goal of the project is to compare different classification algorithms on the solution of plane and car shape datasets.

Academic Website

Academic Website

Academic personal website that includes a short description, social links, biography, interests, education, skills, experience, accomplishments, projects and contact info.

GitHub Website

GitHub Website

GitHub personal website that includes a photo, short description, social links and GitHub repositories and topics.

MFDS

MFDS

Métodos Formales de Desarrollo de Software.

ProMeta

ProMeta

Metaereduetan oinarritutako softwarearen garapenerako prozesuen definizio eta ezarpenerako sistema.

Contact

Visit Us

Faculty of Informatics, Manuel Lardizabal Ibilbidea, 1

Office 314 on floor 3

Donostia, Gipuzkoa 20018

Office Hours

Monday - Friday 10:00 - 17:00

View on Map

Connect

Send us a Message