Has anyone used any good labeling solutions to create ground Cerebral Valley #06-technical-discussion

Has anyone used any good labeling solutions to cre...

Yi Zhang

01/11/2024, 4:07 PM

Has anyone used any good labeling solutions to create ground truth dataset for RAG evaluation? The task is to label the relevant context from a corpus of documents (mostly pdfs) given a question. The existing NLP labeling seem to be more focused on text entity extraction than labeling a whole chunk of content.

Jeng Yang Chia

01/11/2024, 4:11 PM

What type of labeling do you need? Have you tried approaches like Llamaindex's automated document hierarchy construction

Yi Zhang

01/11/2024, 4:15 PM

Essentially we need to label the ground truth contexts (chunks) that should be retrieved to answer a user query and use it to evaluate / benchmark retrieval systems.

Jeng Yang Chia

01/11/2024, 4:21 PM

Got it - yeah I would say to try playing around with document hierarchies as that would be the solution. https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents.html You can use LLMs to help generate summary context too to throw into the metadata. We've been exploring using knowledge graphs for document hierarchies too, and happy to chat on that front if helpful.

Yi Zhang

01/11/2024, 4:40 PM

I understand the llamaindex document structure. I'm more looking for a human labeling UI, like labelbox / scaleAI type of solution to get human-labeled ground truth.

Scott Howard

01/12/2024, 8:20 AM

@Greg Schoeninger

Greg Schoeninger

01/12/2024, 4:25 PM

We're working on something along these lines at Oxen.ai if you want to chat!

👍 1

🎉 1

Marius Buleandra

01/19/2024, 7:49 PM

@Sumanyu you might have some ideas here

🔥 1

5 Views

Open in Slack

Previous Next