We built semantic cache from scratch at Portkey - already seeing 20% cache hit rates for Q&A and RAG use cases (at 10M GPT4 requests a day, that's $2,700 saved a month)
Wrote down the technical details, latency benchmarks etc here: https://blog.portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/
Would love to discuss with anyone exploring in this space!
👍 2
c
Cole G
07/13/2023, 2:20 AM
Awesome idea, I love this. Reading up on it and will sign up for the beta.