👋 We recently discussed EvalGen-- a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. We review the workflow, discuss whether more assertions in evals are always a good thing, and draw out some implications from the user study outlined in the research.
Don’t miss the takeaways for app builders at the very end!
Recording, podcast version, and transcript here:
https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/