A community of founders and builders creating the next generation of technology.

Cerebral Valley

I'm use PGVector as my embeddings database and I'm not getting the best retrievals. Are all vector databases created the same or is something like Pinecone better?

How much data is goin into those vectors?  If it’s too much, the lookups will be pretty generic.

Hmm, I think we're chunking them to ~500 tokens.

Personally, I found the quality of pinecone's similarity much better than pgvector, with the same vectors

I'm confused as to why, tbh, since my understanding is that you use the same distance function (cosine, or L2...).

Did not investigate much, just switched to pinecone, but I'd like to get to the bottom of it

That’s interesting… If you have a small sample set, I could also run it on my vector database and see if there’s any difference there.