I'm use PGVector as my embeddings database and I'm not getting the best retrievals. Are all vector databases created the same or is something like Pinecone better?
e
Eduardo Gonzalez
08/01/2023, 2:45 AM
How much data is goin into those vectors? If it’s too much, the lookups will be pretty generic.
c
Chris Johnston
08/01/2023, 2:17 PM
Hmm, I think we're chunking them to ~500 tokens.
update, we're using 320 tokens
n
Nicholas Charriere
08/01/2023, 9:33 PM
Personally, I found the quality of pinecone's similarity much better than pgvector, with the same vectors
I'm confused as to why, tbh, since my understanding is that you use the same distance function (cosine, or L2...).
Did not investigate much, just switched to pinecone, but I'd like to get to the bottom of it
🙌 1
c
Chris Johnston
08/01/2023, 10:46 PM
If you find out why, please share 👍
👍 1
e
Eduardo Gonzalez
08/02/2023, 1:24 AM
That’s interesting… If you have a small sample set, I could also run it on my vector database and see if there’s any difference there.