Reproducibility

Natural Language Processing Large Language Models Deep Learning Evaluation Reproducibility

Lessons from the Trenches on Reproducible Evaluation of Language Models

Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation …

ArXiv

•

Stella Biderman

Hailey Schoelkopf

Lintang Sutawika

Leo Gao

Jonathan Tow

Baber Abbasi

Alham Fikri Aji

Pawan Sasanka Ammanamanchi

Sidney Black

Jordan Clive

Anthony DiPofi

Julen Etxaniz

Benjamin Fattori

Jessica Zosa Forde

Charles Foster

Jeffrey Hsu

Mimansa Jaiswal

Wilson Y. Lee

Haonan Li

Charles Lovering

Niklas Muennighoff

Ellie Pavlick

Jason Phang

Aviya Skowron

Samson Tan

Xiangru Tang

Kevin A. Wang

Genta Indra Winata

François Yvon

Andy Zou

• may. 23, 2024 • 1 min de lectura

PDF Código fuente arXiv

No results found

Reproducibility

Lessons from the Trenches on Reproducible Evaluation of Language Models