Hi everyone 👋 Wanted to share some recent learnings on estimating confidence of Large Language Model (LLM) responses:
https://www.refuel.ai/blog-posts/labeling-with-confidence
Confidence estimation is an effective tool to mitigate impact of LLM hallucinations in applications like data labeling, attribute extraction and question answering - it allows us to automatically reject low confidence response, chain/ensemble LLMs etc. However, an open question is how to best accomplish this.
I’m curious to hear about your experience/learnings on this topic!