A community of founders and builders creating the next generation of technology.

Cerebral Valley

We built semantic cache from scratch at Portkey - already seeing 20% cache hit rates for Q&amp;A and RAG use cases (at 10M GPT4 requests a day, that's $2,700 saved a month)

Wrote down the technical details, latency benchmarks etc here: <https://blog.portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache/>

Would love to discuss with anyone exploring in this space!

Awesome idea, I love this. Reading up on it and will sign up for the beta.