Have another question :slightly_smiling_face: Have...
# 06-technical-discussion
j
Have another question 🙂 Have anyone been able to context switch in a chatbot convo that uses embeddings for info retrieval from a set of documents? Essentially this question: https://community.openai.com/t/dealing-with-context-switching-in-a-conversation-that-uses-embeddings-for-information-retrieval/87836/2
v
Yes, I have done this by using different embeddings for the new context. Essentially, build a lightweight classification model (nothing complex) to roughly get the keywords that will be used for embedding search.
h
Embeddings alone in a single index is going to be a big challenge and need quite a lot of non scalable band aid to make it work.
One way to handle this is use an information retrieval system / search engine, and use LLM to do DSL generation, similar to using LLM to generate SQL queries
One other alternative is to have an API in front of the search engine, similar to, say, Google or Bing API.
v
Yes, I split the index into different chunks for storage based on context. I don't prefer the Query generation method because you don't want to execute arbitrary code in production or even testing because of the resource constraints. Recent paper also showed that code generation is very unreliable. With embedding based methods, you can at least inspect the relevant chunks that are being retrieved.
d
Hey Jess, I have tried building a layer for determining the Conversation Stage and based on that switched the Prompts, think of it like a Prompt Manager!
d
@Vishwanath Seshagiri Curious about your approach. Are you running the classifier to extract keywords on every received user input? And do you decide to run an embedding search every time, or only when you've inferred a context change?
v
We've different texts for different "contexts", when creating the entry in embedding database, we store the index under that context keyword. Example context from the perspective of say Apple would be ipad, iphone, macos etc. The small classifier allows us to figure out where the question is from, and use that specific embedding along with the LLM call.
d
How many different contexts are you supporting simultaneously with that approach?
v
Right now 3, but at the max we’d have 5
j
Thanks for everyone's thoughts. So our main issue is not not so much with the embeddings from what we’ve seen. We’re creating a chat-like interface where you can “talk” to multiple documents. You can ask follow-up questions, or ask something completely unrelated to the previous questions. The main challenge we’re facing is knowing whether we need the chat history to create a standalone question. We’ve been trying out prompt tweaks, and it appears to be the main factor in solving this
Copy code
`Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}

Disregard the chat history if the follow up question is not related to it.

Follow Up Input: {question}
Standalone question:`;
v
One of the problems that can happen for long chat history is the loss of context for non-standalone question. GPT4 seems to have a context length of 32000 tokens, but a lot of users dispute that claim.
j
Yup - the limit for GPT-3.5 turbo seems to be even lower, around 4k tokens (not 100% sure). We're using GPT-3.5 turbo currently because of its faster response times. We're considering summarizing the chat history before prompting the LLM which might help overcome the token limit.