Folks, a quick question on ConversationalRetrievalChain. I am using this chain to read additional data from the vector database instead of just passing the information in context. Specifically,
qa = ConversationalRetrievalChain.from_llm(
llm=OpenAI(temperature=0),
retriever=
vectorstore.as_retriever(),
memory=memory,
) where the vectorstore is a pinecone database. In my vector database, the text field for each document is fairly large - 8k words. As a result at query time I am exceeding the token limit for the llm. Two questions:
1. At query time how many documents’ text does the llm read? Is it easy to define say 4 or 5 documents and not more?
2. One potential idea could be to chunk the text of the document at insert time, but it would obviously provide incomplete information. I am sure someone has run into this problem. Any suggestions on how to resolve it?