We've seen more and more devs explore augmenting their agents with custom function calls and tools - but while this is powerful, it can make your pipeline much harder to evaluate.
In this tutorial, I've put together an example of how you can use an LLM-as-judge to evaluate 3 key steps of any function-calling agent using Phoenix. Check it out and let me know if you have any questions!