@Pasquale Antonante and I just published a
deep dive on our findings of RAG Eval.
What’s inside:
•
In-Depth Metric Analysis: pros & cons of various
deterministic and
LLM-based retrieval
metrics
•
Comparative Benchmarking: GPT-4, GPT-3.5, and Claude 2.1 in retrieval assessment without ground truth labels
•
Step-by-Step Guide: using metrics for
systematic quality enhancement
•
Open-Source Tool: continuous-eval to run plug-&-play evaluation on your dataset
Whether you are already super experienced with RAG Eval or new to setting them up for your pipeline,
we'd love to hear your feedback!