#evaluation #multi-metric #llm

FLASK defines four primary abilities which are divided into 12 fine-grained skills to evaluate the performance of language models comprehensively.

FLASK_protocol.png

More details here: