A community of founders and builders creating the next generation of technology.

Cerebral Valley

Looking for an easy-to-use platform to evaluate LLMs. Need something that can analyze responses, track metrics, and provide scores like MT-Bench, MMLU, etc. Any recommendations?

Hey Dheemanth – Founder of <http://Airtrain.ai|Airtrain.ai> here.
We have a batch eval tool that lets you generate inferences on your entire eval dataset and evaluate those with a number of metrics that you define. We support a number of OSS and proprietary models, as well as fine-tuned ones that you can generate in our no-code fine-tuning tool.
Here is a demo video of the batch eval tool: <https://youtu.be/O-Uquvmbt-U>
And here is how to reproduce zero-shot MMLU results: <https://docs.airtrain.ai/docs/mmlu-benchmark>
Feel free to reach out if you have any questions or want to book a chat. <mailto:emmanuel@airtrain.ai|emmanuel@airtrain.ai>