https://cerebralvalley.ai logo
#06-technical-discussion
Title
# 06-technical-discussion
d

Deepanshu

05/09/2023, 6:33 PM
Are there any better ways to weigh the GPT-4 responses (like get Confidence scores) based on domain knowledge? I am trying something on the lines of generating multiple messages and judge which one is the better one. Would love if someone can share some resources or share some examples/experiences etc.
o

Olabode Adedoyin

05/09/2023, 7:43 PM
Do you know what the right answers should be?
gpt4 lacks logprobs but i guess you could try to hack around that haha
d

Deepanshu

05/10/2023, 3:33 AM
Yes @Olabode Adedoyin we do have Human Edits in those messages, that can serve as a ground truth, curious to learn what you have in mind
o

Olabode Adedoyin

05/10/2023, 4:33 PM
On quick fix might be to use vector embeddings to compute a cosine Similarity score between your ground truth and what the model generates.
d

Deepanshu

05/10/2023, 6:54 PM
I see, good idea, will try these