Data Contamination

Natural Language Processing Large Language Models Evaluation Data Contamination Deep Learning

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark

In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data …

EMNLP 2023 Findings

•

Oscar Sainz

Jon Ander Campos

Iker García-Ferrero

Julen Etxaniz

Oier Lopez de Lacalle

Eneko Agirre

• Oct 27, 2023 • 1 min read

PDF arXiv

No results found

Data Contamination

NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark