A community of founders and builders creating the next generation of technology.

Cerebral Valley

What is the best way that people evaluate the performance of a RAG system? What are best tools / data sets?

It really depends on the objective, and by nature it’s largely subjective.

I create my own test cases for specific RAG systems, and run them all at once. There are QA sets available online as well, so those are another option. For example, MedQA for medical-related QnAs.

I think Truera supports this type of agent-eval-saas, but honestly there are a lot these similar saas tools out there.