A community of founders and builders creating the next generation of technology.

Cerebral Valley

We've seen more and more devs explore augmenting their agents with custom function calls and tools - but while this is powerful, it can make your pipeline much harder to evaluate.

In this tutorial, I've put together an example of how you can use an LLM-as-judge to evaluate 3 key steps of any function-calling agent using <https://phoenix.arize.com|Phoenix>. Check it out and let me know if you have any questions!

<https://www.youtube.com/watch?v=EfhylWtNb1s>