The spider evaluation benchmark of PingCAP Chat2Query program is 86.3, here is codalab link of the benchmark:
https://worksheets.codalab.org/worksheets/0xeaa16ad377f14a21aa8edbed90e49233 https://worksheets.codalab.org/bundles/0xe1fe59dd2177413b83b958f108ee9693
Below are the steps to reproduce the score.
You have to login in TiDBCloud, and create a Chat2Query DataApp.
Save the Base URL, we'll use it in step 5.
Save the public key and private key, we'll use it in step 5.
$ git clone https://github.com/tidbcloud/chat2query_bench
$ cd chat2query_bench/benchmark_spider
Download the spider dataset: https://drive.google.com/u/0/uc?id=1iRDVHLr4mX2wQKSgA9J8Pire73Jahh0m&export=download
unzip it in the spider_chat2query
folder, and make sure the folder name is spider
.
Build the container by the following command:
$ docker build -f ./Dockerfile.base . -t spider_chat2query:base
$ docker build . -t spider_chat2query
NOTE By default, you're running the benchmark in GPT-3.5, to reproduce the best running results, please contact us to upgrade your app settings by using GPT-4.
$./gensql.sh
$./evaluation.sh