Looking for an easy-to-use platform to evaluate LL...
# 06-technical-discussion
d
Looking for an easy-to-use platform to evaluate LLMs. Need something that can analyze responses, track metrics, and provide scores like MT-Bench, MMLU, etc. Any recommendations?
e
Hey Dheemanth – Founder of Airtrain.ai here. We have a batch eval tool that lets you generate inferences on your entire eval dataset and evaluate those with a number of metrics that you define. We support a number of OSS and proprietary models, as well as fine-tuned ones that you can generate in our no-code fine-tuning tool. Here is a demo video of the batch eval tool:

https://youtu.be/O-Uquvmbt-U

And here is how to reproduce zero-shot MMLU results: https://docs.airtrain.ai/docs/mmlu-benchmark Feel free to reach out if you have any questions or want to book a chat. emmanuel@airtrain.ai
d
Thanks will check this out