Looking for an easy-to-use platform to evaluate LLMs. Need something that can analyze responses, track metrics, and provide scores like MT-Bench, MMLU, etc. Any recommendations?
e
Emmanuel Turlay
02/22/2024, 3:33 AM
Hey Dheemanth – Founder of Airtrain.ai here.
We have a batch eval tool that lets you generate inferences on your entire eval dataset and evaluate those with a number of metrics that you define. We support a number of OSS and proprietary models, as well as fine-tuned ones that you can generate in our no-code fine-tuning tool.
Here is a demo video of the batch eval tool: