Everyone, here is our new Text-2-SQL Benchmark. You can test your own model, or one that you’re considering using for your API. Why Do This? Shortcomings of current benchmarks (BIRD, Spider, WikiSQL, SParC, etc...): 1. Absent from these are “complicated” queries 2. The Natural Language prompts typically used are not very complicated 3. None measure how well a generative model performs on a messy and unorganized database Enjoy!