MMLU benchmark

#evaluation #multi-metric #llm

MMLU (Measuring Massive Multitask Language Understanding) is a comprehensive test designed to evaluate text models’ multitask accuracy and capabilities across a broad range of subjects. This test comprises 57 tasks that cover diverse fields such as elementary mathematics, US history, computer science, law, and many more. The goal is to assess a model’s world knowledge and problem-solving abilities in these varied domains.

Getting a high-score at MMLU requires extensive world knowledge and problem-solving abilities across a wide array of subjects.

Smaller models usually struggle to demonstrate significant competence in this challenging benchmark.

More details here: