Julen Etxaniz

Estudiante de Doctorado en Análisis y Procesamiento del Lenguaje

IXA Grupo de Investigación

HiTZ Centro Vasco de Tecnología de la Lengua

Universidad del País Vasco (UPV/EHU)

Biografía

Estudiante de Doctorado en Análisis y Procesamiento del Lenguaje en HiTZ Center IXA Group (UPV/EHU). Trabajando en mejorar los modelos de lenguaje para idiomas con pocos recursos. Graduado en Ingeniería Informática con especialidad en Ingeniería del Software. Máster en Análisis y Procesamiento del Lenguaje.

En esta web encontrarás información sobre Habilidades, Certificados, Proyectos, Etiquetas y Contacto.

Intereses

Programación
Desarrollo Web
Ingeniería del Software
Aprendizaje Automático
Aprendizaje Profundo
Procesamiento de Lenguaje Natural

Educación

Grado en Ingeniería Informática, 2017-2021
Universidad del País Vasco (UPV/EHU)
Máster en Análisis y Procesamiento del Lenguaje, 2021-2022
Universidad del País Vasco (UPV/EHU)
Doctorado en Análisis y Procesamiento del Lenguaje, 2023-Presente
Universidad del País Vasco (UPV/EHU)

Experiencia

Doctorado en Análisis y Procesamiento del Lenguaje

UPV/EHU

enero 2023 – Actualmente Donostia

Educación

Grado en Ingeniería Informática

UPV/EHU

septiembre 2017 – septiembre 2021 Donostia

Máster en Análisis y Procesamiento del Lenguaje

UPV/EHU

octubre 2021 – octubre 2021 Donostia

Doctorado en Análisis y Procesamiento del Lenguaje

UPV/EHU

enero 2023 – Actualmente Donostia

Habilidades

Idiomas

Lenguajes de Programación

Desarrollo Web

Ingeniería del Software

Aprendizaje Automático

Herramientas

Idiomas

Euskara

Español

English

Lenguajes de Programación

Python

R

Java

JavaScript

PHP

SQL

Desarrollo Web

HTML5

CSS3

Bootstrap

Hugo

Django

.NET

Ingeniería del Software

Requerimientos

Diseño

Desarrollo

Pruebas

Metodologías

Control de Versiones

Aprendizaje Automático

Clasificación

Regresión

Redes Neuronales

Jupyter Notebook

Scikit-Learn

Tensorflow

Tools

Git

GitHub

Xamarin

Eclipse

Visual Studio Code

Visual Studio

Publicaciones

Jaione Bengoetxea, Mikel Zubillaga, Ekhi Azurmendi, Maite Heredia, Julen Etxaniz, Markel Ferro, Jeremy Barnes

12-13-2024 COLING 2025 Natural Language Processing, Large Language Models, Deep Learning, Multilinguality, Dialects, Norwegian

HiTZ at VarDial 2025 NorSID: Overcoming Data Scarcity with Language Transfer and Automatic Data Annotation

In this paper we present our submission for the NorSID Shared Task as part of the 2025 VarDial Workshop (Scherrer et al., 2025), consisting of three tasks: Intent Detection, Slot Filling and Dialect Identification, evaluated using data in different dialects of the Norwegian language. For Intent Detection and Slot Filling, we have fine-tuned a multitask model in a cross-lingual setting, to leverage the xSID dataset available in 17 languages. In the case of Dialect Identification, our final submission consists of a model fine-tuned on the provided development set, which has obtained the highest scores within our experiments. Our final results on the test set show that our models do not drop in performance compared to the development set, likely due to the domain-specificity of the dataset and the similar distribution of both subsets. Finally, we also report an in-depth analysis of the provided datasets and their artifacts, as well as other sets of experiments that have been carried out but did not yield the best results. Additionally, we present an analysis on the reasons why some methods have been more successful than others; mainly the impact of the combination of languages and domain-specificity of the training data on the results.

Giulia Pensa, Ekhi Azurmendi, Julen Etxaniz, Begoña Altuna, Itziar Gonzalez-Dios

12-06-2024 CLiC-it 2024 Natural Language Processing, Large Language Models, Deep Learning, Evaluation, Commonsense Reasoning, Italian

GITA4CALAMITA - Evaluating the Physical Commonsense Understanding of Italian LLMs in a Multi-layered Approach: A CALAMITA Challenge

In the context of the CALAMITA Challenge, we investigate the physical commonsense reasoning capabilities of large language models (LLMs) and introduce a methodology to assess their understanding of the physical world. To this end, we use a test set designed to evaluate physical commonsense reasoning in LLMs for the Italian language. We present a tiered dataset, named the Graded Italian Annotated dataset (GITA), which is written and annotated by a professional linguist. This dataset enables us to focus on three distinct levels of commonsense understanding. Our benchmark aims to evaluate three specific tasks: identifying plausible and implausible stories within our dataset, identifying the conflict that generates an implausible story, and identifying the physical states that make a story implausible. We perform these tasks using LLAMA3, Gemma2 and Mistral. Our findings reveal that, although the models may excel at high-level classification tasks, their reasoning is inconsistent and unverifiable, as they fail to capture intermediate evidence.

Naiara Perez, Julen Etxaniz, Oscar Sainz, Itziar Aldabe, German Rigau, Eneko Agirre, Ahmed Salem, Aitor Ormazabal, Mikel Artetxe, Aitor Soroa

09-24-2024 EKAIA EHUko Zientzia eta Teknologia aldizkaria Natural Language Processing, Large Language Models, Deep Learning, Multilinguality, Basque

Latxa Euskarazko Hizkuntza-Eredua

Artikulu honetan Latxa hizkuntza-ereduak (HE) aurkeztuko ditugu, egun euskararako garatu diren HE handienak. Latxa HEek 7.000 miloi parametrotik 70.000 milioira bitartean dituzte, eta ingeleseko LLama 2 ereduetatik eratorriak dira. Horretarako, LLama 2 gainean aurreikasketa jarraitua izeneko prozesua gauzatu da, 4.3 milioi dokumentu eta 4.200 milioi token duen euskarazko corpusa erabiliz. Euskararentzat kalitate handiko ebaluazio multzoen urritasunari aurre egiteko, lau ebaluazio multzo berri bildu ditugu: EusProficiency, EGA azterketaren atariko frogako 5.169 galdera biltzen dituena; EusReading, irakurketaren ulermeneko 352 galdera biltzen dituena; EusTrivia, 5 arlotako ezagutza orokorreko 1.715 galdera biltzen dituena; eta EusExams, oposizioetako 16.774 galdera biltzen dituena. Datu-multzo berri hauek erabiliz, Latxa eta beste euskarazko HEak ebaluatu ditugu (elebakar zein eleanitzak), eta esperimentuek erakusten dute Latxak aurreko eredu ireki guztiak gainditzen dituela. Halaber, GPT-4 Turbo HE komertzialarekiko emaitza konpetitiboak lortzen ditu Latxak, hizkuntza-ezagutzan eta ulermenean, testu-irakurmenean zein ezagutza intentsiboa eskatzen duten atazetan atzeratuta egon arren. Bai Latxa ereduen familia, baita gure corpus eta ebaluazio-datu berriak ere lizentzia irekien pean daude publiko https://github. com/hitz-zentroa/latxa helbidean.

Julen Etxaniz, Gorka Azkune, Aitor Soroa, Oier Lopez De Lacalle, Mikel Artetxe

06-11-2024 NeurIPS Datasets and Benchmarks 2024 Natural Language Processing, Large Language Models, Deep Learning, Evaluation, Multilinguality, Culture, Basque

BertaQA: How Much Do Language Models Know About Local Culture?

Large Language Models (LLMs) exhibit extensive knowledge about the world, but most evaluations have been limited to global or anglocentric subjects. This raises the question of how well these models perform on topics relevant to other cultures, whose presence on the web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque. The dataset consists of a local subset with questions pertinent to the Basque culture, and a global subset with questions of broader interest. We find that state-of-the-art LLMs struggle with local cultural knowledge, even as they excel on global topics. However, we show that continued pre-training in Basque significantly improves the models’ performance on Basque culture, even when queried in English. To our knowledge, this is the first solid evidence of knowledge transfer from a low-resource to a high-resource language. Our analysis sheds light on the complex interplay between language and knowledge, and reveals that some prior findings do not fully hold when reassessed on local topics. Our dataset and evaluation code are available under open licenses at https://github.com/juletx/BertaQA.

Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan, Xiangru Tang, Kevin A. Wang, Genta Indra Winata, François Yvon, Andy Zou

05-23-2024 ArXiv Natural Language Processing, Large Language Models, Deep Learning, Evaluation, Reproducibility

Lessons from the Trenches on Reproducible Evaluation of Language Models

Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers. First, we provide an overview of common challenges faced in language model evaluation. Second, we delineate best practices for addressing or lessening the impact of these challenges on research. Third, we present the Language Model Evaluation Harness (lm-eval): an open source library for independent, reproducible, and extensible evaluation of language models that seeks to address these issues. We describe the features of the library as well as case studies in which the library has been used to alleviate these methodological concerns.

Más publicaciones

Proyectos

Mejorando la seguridad de mi web

Analizaré mi web con herramientas como Hardenize y Security Headers para detectar los aspectos de seguridad que se pueden mejorar.

Spiking Neural Network

Simulating the Izhikevich spiking neuron model using the Brian2 software

Image Caption Generation

Automatic Image Caption Generation model that uses a CNN to condition a LSTM based language model.

Shape Classification

The goal of the project is to compare different classification algorithms on the solution of plane and car shape datasets.

100Iragarki

Tu escaparate digital Servicios y Aplicaciones de Red 2019-2020

Academic Web

Web personal Academic que incluye una descripción corta, enlaces sociales, biografía, intereses, educación, habilidades, experiencia, logros, proyectos e información de contacto.

Antxieta Arkeologi Taldea Web

Web de Antxieta Arkeologi Taldea, grupo cultural sin ánimo de lucro que desarrolla la investigación arqueológica en Gipuzkoa.

BattleshipFeatureIDE

Java Battleship FeatureIDE Software Product Line.

Community Detection

NIPS kongresuko autoreen komunitateak detektatzen metaheuristikoak erabiliz.

Comparing Writing Systems

Comparing Writing Systems with Multilingual Grapheme-to-Phoneme and Phoneme-to-Grapheme Conversion.

Computational Syntax

Computational Syntax slides and exercises.

Corpus Linguistics

Corpus Linguistics slides, labs, assignments and data.

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing slides, labs and assignments.

Dialbot

Ikasketa sakonean oinarritutako muturretik muturrerako solasaldi sistema.

Egunean Behin Visual Question Answering Dataset

This is a Visual Question Answering dataset based on questions from the game Egunean Behin. Egunean Behin is a popular Basque quiz game. The game consists on answering 10 daily multiple choice questions.

GitHub Web

Web personal GitHub que incluye una foto, descripción corta, enlaces sociales y repositorios y temas de GitHub.

Grounding Language Models for Spatial Reasoning

HackerRank Challenge Solutions

Solutions for programming challenges in multiple languages.

Hyperpartisan News Analysis With Scattertext

Machine Learning and Neural Networks labs

Machine Learning and Neural Networks labs.

Machine Learning and Neural Networks lectures

Machine Learning and Neural Networks lectures.

Machine Learning exercises with R

Machine Learning exercises with R.

MFDS

Métodos Formales de Desarrollo de Software.

NLP Applications I - Text Classification, Sequence Labelling, Opinion Mining and Question Answering

NLP Applications I - Text Classification, Sequence Labelling, Opinion Mining and Question Answering slides, labs and project.

NLP Applications II - Information Extraction, Question Answering, Recommender Systems and Conversational Systems

NLP Applications II - Information Extraction, Question Answering, Recommender Systems and Conversational Systems slides, labs and project.

ProMeta

Metaereduetan oinarritutako softwarearen garapenerako prozesuen definizio eta ezarpenerako sistema.

ProMeta IO-System

ProMeta proiektua IO-System.

ProMeta ModelEditor

ProMeta proiektua ModelEditor.

Quiz

Juego de preguntas Sistemas Web 2019-2020

Twitter Sentiment and Emotion Analysis

Twitter Sentiment and Emotion Analysis.

Zero-shot and Translation Experiments on XQuAD, MLQA and TyDiQA

Contacto

juletxara@gmail.com
linkedin.com/in/juletxara
twitter.com/juletxara
Facultad de Informática, Manuel Lardizabal Ibilbidea, 1
20018 San Sebastián, Guipúzcoa, España
Oficina 314 en el piso 3
Lunes - Jueves 10:00 - 17:00