Everyone, here is our new
Text-2-SQL Benchmark.
You can test your own model, or one that you’re considering using for your API.
Why Do This?
Shortcomings of current benchmarks (BIRD, Spider, WikiSQL, SParC, etc...):
1. Absent from these are “complicated” queries
2. The Natural Language prompts typically used are not very complicated
3. None measure how well a generative model performs on a messy and unorganized database
Enjoy!