Setup datasets and experiments in Arize Updating y...
# 07-self-promotion
e
Setup datasets and experiments in Arize Updating your prompts can feel like guessing. You find a new prompting technique on arXiv or Twitter which works well on a few examples, only to later run into issues. The reality of AI engineering is that prompting is non-deterministic; it’s easy to make a small change and cause performance regressions in your product. A better approach is evaluation-driven development; leveraging Arize, you can curate a dataset of key points that you’re trying to test, run your LLM task against those key points, and use code or LLMs or user-generated annotations to evaluate the output with aggregate scores. This allows you to test as you build and verify experiments before you deploy to customers. I run through a quick demo and accompanying notebook creating a user research AI and how I iterate on the prompts in the video below!

https://youtu.be/4PNU5mTbGec