:tada: Been deep into AI research the last few wee...
# 07-self-promotion
v
🎉 Been deep into AI research the last few weeks and months, got a huge number of papers dropping soon. First one landed is
Tiny QA Benchmark++
which is a micro QA dataset for evals along with a synthetic data generator module to generate QA dataset pairs for eval (and potentially training) your models: • We found that <50 synthetic dataset was able to see a drift in model performance (Gemini, Mistral, Llamma) • We found that a well rounded 10-20 QA dataset can help act as an initial test before running a bigger eval (Save $$ and time) • Drift of accuracy differs by topic and/or language BUT this can be used to quickly see if a model has coverage in a topic without having to do some complex testing or analysis (could be a specific body of knowledge in your domain) Paper: https://huggingface.co/papers/2505.12058 Github: https://github.com/vincentkoc/tiny_qa_benchmark_pp Hugging Face Datasets: https://huggingface.co/datasets/vincentkoc/tiny_qa_benchmark_pp