https://cerebralvalley.ai logo
#06-technical-discussion
Title
# 06-technical-discussion
j

jess

05/04/2023, 4:32 PM
Have another question 🙂 Have anyone been able to context switch in a chatbot convo that uses embeddings for info retrieval from a set of documents? Essentially this question: https://community.openai.com/t/dealing-with-context-switching-in-a-conversation-that-uses-embeddings-for-information-retrieval/87836/2
v

Vishwanath Seshagiri

05/04/2023, 4:33 PM
Yes, I have done this by using different embeddings for the new context. Essentially, build a lightweight classification model (nothing complex) to roughly get the keywords that will be used for embedding search.
h

Han

05/04/2023, 4:41 PM
Embeddings alone in a single index is going to be a big challenge and need quite a lot of non scalable band aid to make it work.
One way to handle this is use an information retrieval system / search engine, and use LLM to do DSL generation, similar to using LLM to generate SQL queries
One other alternative is to have an API in front of the search engine, similar to, say, Google or Bing API.
v

Vishwanath Seshagiri

05/04/2023, 4:46 PM
Yes, I split the index into different chunks for storage based on context. I don't prefer the Query generation method because you don't want to execute arbitrary code in production or even testing because of the resource constraints. Recent paper also showed that code generation is very unreliable. With embedding based methods, you can at least inspect the relevant chunks that are being retrieved.
d

Deepanshu

05/04/2023, 5:44 PM
Hey Jess, I have tried building a layer for determining the Conversation Stage and based on that switched the Prompts, think of it like a Prompt Manager!
d

Daniel Hsu

05/04/2023, 11:11 PM
@Vishwanath Seshagiri Curious about your approach. Are you running the classifier to extract keywords on every received user input? And do you decide to run an embedding search every time, or only when you've inferred a context change?
v

Vishwanath Seshagiri

05/05/2023, 9:19 PM
We've different texts for different "contexts", when creating the entry in embedding database, we store the index under that context keyword. Example context from the perspective of say Apple would be ipad, iphone, macos etc. The small classifier allows us to figure out where the question is from, and use that specific embedding along with the LLM call.
d

Daniel Hsu

05/05/2023, 10:04 PM
How many different contexts are you supporting simultaneously with that approach?
v

Vishwanath Seshagiri

05/06/2023, 1:11 AM
Right now 3, but at the max we’d have 5
j

jess

05/09/2023, 3:29 PM
Thanks for everyone's thoughts. So our main issue is not not so much with the embeddings from what we’ve seen. We’re creating a chat-like interface where you can “talk” to multiple documents. You can ask follow-up questions, or ask something completely unrelated to the previous questions. The main challenge we’re facing is knowing whether we need the chat history to create a standalone question. We’ve been trying out prompt tweaks, and it appears to be the main factor in solving this
Copy code
`Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}

Disregard the chat history if the follow up question is not related to it.

Follow Up Input: {question}
Standalone question:`;
v

Vishwanath Seshagiri

05/09/2023, 3:34 PM
One of the problems that can happen for long chat history is the loss of context for non-standalone question. GPT4 seems to have a context length of 32000 tokens, but a lot of users dispute that claim.
j

jess

05/09/2023, 5:34 PM
Yup - the limit for GPT-3.5 turbo seems to be even lower, around 4k tokens (not 100% sure). We're using GPT-3.5 turbo currently because of its faster response times. We're considering summarizing the chat history before prompting the LLM which might help overcome the token limit.