A community of founders and builders creating the next generation of technology.

Cerebral Valley

Screenshot 2023-09-02 at 6.28.37 PM.png

Screenshot 2023-09-02 at 6.28.15 PM.png

hey folks, I am running a survey for the <https://home.mlops.community/home|MLOps community> on evaluating LLMs.

Like <https://docs.google.com/spreadsheets/d/13wdBwkX8vZrYKuvF4h2egPh0LYSn2GQSwUaLV4GUNaU/edit?usp=sharing|the last survey>, *all responses will be open sourced* for everyone to dig into the data. If you are or aren't using LLMs I would love to hear why and how you evaluate the projects.

here are some sample questions:

• What aspects of LLM performance do you consider important when evaluating?
• What data are you using to evaluate your LLMs?
• Do you have ground-truth labels for your data? If so, how did you generate the labels?
• Are you using human evaluators in your LLM evaluation process?
There have been some nice signals thus far.

<https://docs.google.com/forms/d/e/1FAIpQLSdCqbJUJYdJBcRRbGyjQU6FQFz61ouuQMlX2Zo6kN2V6eQ8qQ/viewform?usp=sf_link>

1.jpg

4.jpg

2.jpg

3.jpg

here are a few more responses I liked. I think we have all felt this.

for those who are following along the responses of the survey are out. you can see them all here. cool to see how others are attacking the evaluation challenges.

seems like the consensus is just to put the thumbs up thumbs down and call it a day.... not sure thats good enough though.

<https://docs.google.com/spreadsheets/d/1SIsM5UaZoLoze8TiBoIQl350R2lfXr0BfzrBS8jtTJU/edit?usp=sharing>