The latest ✨`v0.0.24` ✨ of 🦉 Arize-Phoenix, an open source library offering ML observability in a notebook, is here!
This release updates Phoenix's capabilities for cluster-based analysis, providing more metrics to help you assess the performance and data quality of your unstructured data.
Here are the highlights:
• Clusters can now be analyzed for model performance degradation. Phoenix now includes accuracy as a model performance metric. Using accuracy as the base metric on the embedding projection allows you to drill into clusters that map to bad predictions quicker than ever before
• Finding pockets of bad performance is as simple as picking the metric and sorting the clusters by worst performing. If you are using Phoenix to identify production data that should be re-labeled and fed back into your training pipeline, this is the feature for you.
• Clusters can now be analyzed via ad-hoc metrics. You can now calculate the average of any numeric feature, tag, prediction, or actual sent into Phoenix. This means you can now find "low-quality" clusters via the heuristic of your choosing!
• Identify clusters of chatbot queries that are failing to provide a good answer. The neat thing about this feature is that you can use Phoenix to build your own EDA heuristic!
• Care about rouge score or LLM-assisted evaluations? You can now use these to analyze your embeddings and to discover anomalies by simply sorting your clusters!