https://cerebralvalley.ai logo
#06-technical-discussion
Title
# 06-technical-discussion
c

Chris Johnston

08/01/2023, 12:10 AM
I'm use PGVector as my embeddings database and I'm not getting the best retrievals. Are all vector databases created the same or is something like Pinecone better?
e

Eduardo Gonzalez

08/01/2023, 2:45 AM
How much data is goin into those vectors? If it’s too much, the lookups will be pretty generic.
c

Chris Johnston

08/01/2023, 2:17 PM
Hmm, I think we're chunking them to ~500 tokens.
update, we're using 320 tokens
n

Nicholas Charriere

08/01/2023, 9:33 PM
Personally, I found the quality of pinecone's similarity much better than pgvector, with the same vectors I'm confused as to why, tbh, since my understanding is that you use the same distance function (cosine, or L2...). Did not investigate much, just switched to pinecone, but I'd like to get to the bottom of it
🙌 1
c

Chris Johnston

08/01/2023, 10:46 PM
If you find out why, please share đź‘Ť
đź‘Ť 1
e

Eduardo Gonzalez

08/02/2023, 1:24 AM
That’s interesting… If you have a small sample set, I could also run it on my vector database and see if there’s any difference there.