I'm use PGVector as my embeddings database and I'm not getting the best retrievals. Are all vector databases created the same or is something like Pinecone better?
08/01/2023, 2:45 AM
How much data is goin into those vectors? If it’s too much, the lookups will be pretty generic.
08/01/2023, 2:17 PM
Hmm, I think we're chunking them to ~500 tokens.
update, we're using 320 tokens
08/01/2023, 9:33 PM
Personally, I found the quality of pinecone's similarity much better than pgvector, with the same vectors
I'm confused as to why, tbh, since my understanding is that you use the same distance function (cosine, or L2...).
Did not investigate much, just switched to pinecone, but I'd like to get to the bottom of it
08/01/2023, 10:46 PM
If you find out why, please share 👍
08/02/2023, 1:24 AM
That’s interesting… If you have a small sample set, I could also run it on my vector database and see if there’s any difference there.