hey folks, I am running a survey for the
MLOps community on evaluating LLMs.
Like
the last survey,
all responses will be open sourced for everyone to dig into the data. If you are or aren't using LLMs I would love to hear why and how you evaluate the projects.
here are some sample questions:
• What aspects of LLM performance do you consider important when evaluating?
• What data are you using to evaluate your LLMs?
• Do you have ground-truth labels for your data? If so, how did you generate the labels?
• Are you using human evaluators in your LLM evaluation process?
There have been some nice signals thus far.
https://docs.google.com/forms/d/e/1FAIpQLSdCqbJUJYdJBcRRbGyjQU6FQFz61ouuQMlX2Zo6kN2V6eQ8qQ/viewform?usp=sf_link