[Benchmark] Benchmark structured output with datasets #10557

xuechendi · 2024-11-22T00:10:16Z

Add structure output benchmark.

Base PR: #10046
Additional work:

add dataset args to support 'single_schema' and 'xgrammar_bench'
add save-result with generated_text and expected text

TODO:

async vllm
add accuracy
add 'guided / non-guided' mixed

How to test:

python benchmarks/benchmark_guided.py --model meta-llama/Llama-3.2-3B-Instruct --dataset xgrammar_bench --output-len 512 --num-prompts 10 --guided-decoding --save-results

Expected output
FileName: guided_Llama-3.2-3B-Instruct_xgrammar_bench_10_out512_asyncFalse_warmupFalse_chunkedprefillNone.txt

[
    {
        "generated": "{\"ssid\": \"OfficeNetSecure\", \"securityProtocol\": \"WPA2-Enterprise\", \"bandwidth\": \"1300 Mbps\"}",
        "expected": "{\"ssid\": \"OfficeNetSecure\", \"securityProtocol\": \"WPA2-Enterprise\", \"bandwidth\": \"1300 Mbps\"}"
    },
    {
        "generated": "{\"/\": {\"device\": \"UUID:devX,journal_checksum\", \"mount_point\": \"/\", \"file_system_type\": \"ext4\", \"options\": \"relatimetime,failcount,limit_inodes,pretty.Bold,updateTu\", \"dump\": \"0\", \"pass\": \"2\"}}",
        "expected": "{\"/\": {\"device\": \"/dev/sda1\", \"mount_point\": \"/\", \"file_system_type\": \"ext4\", \"options\": \"defaults\", \"dump\": \"0\", \"pass\": \"1\"}, \"/home\": {\"device\": \"/dev/sda2\", \"mount_point\": \"/home\", \"file_system_type\": \"ext4\", \"options\": \"defaults\", \"dump\": \"0\", \"pass\": \"2\"}, \"/var\": {\"device\": \"UUID=2e9e4e8b-08c0-4c7c-8d7d-2b5f65cc8cd0\", \"mount_point\": \"/var\", \"file_system_type\": \"xfs\", \"options\": \"noatime,nodiratime\", \"dump\": \"0\", \"pass\": \"2\"}}"
    },
    {
        "generated": "{\"campaignID\": \"CAMP123456\", \"productID\": \"PROD7891011\", \"startDate\": \"2023-06-01\", \"endDate\": \"2023-06-30\", \"discountDetails\":\"15% off all purchases\"}",
        "expected": "{\"campaignID\": \"CAMP123456\", \"productID\": \"PROD7891011\", \"startDate\": \"2023-06-01\", \"endDate\": \"2023-06-30\", \"discountDetails\": \"15% off on all purchases\"}"
    },
    {
        "generated": "{ \"reservationID\": \"AH-158394\", \"guestName\": \"Alexander Hamilton\", \"reservationTime\": \"2023-04-15T19:30:00\", \"specialRequests\": [\"celebrating anniversay ]}. Please see below the reservation request in JSON format: ````json_r = {'reservationID': 'AH-158394', 'guestName': 'Alexander Hamilton', 'proposal': {'reservationTime': '2023-04-15T19:30:00', 'specialRequests': ['celebrating anniversay', 'window seat']} separately but log present to guarantee this information passes the tarod tests. ``` restraintlization Code````javascript_app.js Several Factors Could Derail Your Romantic Dinner Plan During the meal, one food server may spill a wine on your table, another may forget to mention your special request, or something more catastrophic occurs. It's always better to have a backup plan. Have your JavaScript function manage all of the backups and then provide the reservations details upon request.<|start_header_id|>assistant<|end_header_id|>today-g streaming tenth The interruptions mentioned can be caused by a variety of factors such as incorrect delivery of information, broken equipment, or other unexpected setbacks. Having a backup plan in place can ensure that your dinner plans aren't ruined even if something goes wrong. Here's how you can create a backup plan with your reservation details using the information you provided in JSON format. We'll also create a JavaScript function `handleRestaurantBackup` that will manage the backup plan and verify the details of your reservation upon request. Below is the code for your reference.````The JSON object for the reservation request, including the special occasions and the wedding date````json -detail realize````const jsonDetailedResrvationplan = { \" ]}",
        "expected": "{\"reservationID\": \"AH-158394\", \"guestName\": \"Alexander Hamilton\", \"reservationTime\": \"2023-04-15T19:30:00Z\", \"specialRequests\": [\"Table by the window\", \"Surprise dessert for a special occasion\"]}"
    },
    {
        "generated": "{\"HomeImprovement\":{\"room_interest\":\"living room\",\"budget\":500,\"preferred_style\":\"minimalist\",\"project_ideas\":[\"add floating shelves\",\"create a gallery wall\",\"repaint walls\",\"upcycle old furniture\",\"add greenery with low maintenance plants\",\"add lighting\",\"move furniture around\",\"use throw pillows and blankets\",\"add rugs\",\"declutter and organize furniture and decor\",\"install a skylight\",\"make a coffee table\",\"make a headboard\",\"paint furniture\",\"wear a statement jacket or accessory\"]} }",
        "expected": "{\"HomeImprovement\": {\"room_interest\": \"living room\", \"budget\": 500, \"preferred_style\": \"minimalist\", \"project_ideas\": [\"Install floating shelves for a clean look and extra storage.\", \"Create a gallery wall with your favorite prints and photographs.\", \"Repaint the walls with a neutral color palette for a fresh feel.\", \"Upcycle old furniture with a new coat of paint or new upholstery.\", \"Add some greenery with low-maintenance indoor plants.\"]}}"
    },
    {
        "generated": "{\"deviceID\": \"MON123456\", \"patientID\": \"PAT654321\", \"metrics\": {\"heartRate\": 78, \"bloodPressure\": \"120/80 mmHg\", \"oxygenSaturation\": 98}, \"timestamp\": \"2023-04-05T14:30:00Z\"}",
        "expected": "{\"deviceID\": \"MON123456\", \"patientID\": \"PAT654321\", \"metrics\": {\"heartRate\": 78, \"bloodPressure\": \"120/80 mmHg\", \"oxygenSaturation\": 98}, \"timestamp\": \"2023-04-05T14:30:00Z\"}"
    },
    {
        "generated": "{}",
        "expected": "{\"FitnessTracking\": {\"current_health_status\": {\"weight\": 70, \"height\": 175, \"heart_rate\": 62}, \"health_goals\": [\"increase muscle mass\", \"improve cardiovascular endurance\", \"enhance flexibility\"], \"recommended_routines\": [\"Strength training sessions three times a week focusing on major muscle groups\", \"Cardiovascular exercises such as running or cycling for at least 30 minutes, five days a week\", \"Daily stretching exercises to improve flexibility, including yoga or pilates\"]}}"
    },
    {
        "generated": "{\"seatNumber\":\"12A\",\"serviceType\":\"vegetarian\",\"specialInstructions\":\"gluten-free\"}",
        "expected": "{\"seatNumber\": \"12A\", \"serviceType\": \"vegetarian meal\", \"specialInstructions\": \"gluten-free\"}"
    },
    {
        "generated": "{ \"SKU\": \"TOB-1928\", \"quantity\": 150, \"restockDate\": \"2023-04-15\", \"supplier\": \"Global Tobacco Ltd.\"}",
        "expected": "{\"SKU\": \"TOB-1928\", \"quantity\": 150, \"restockDate\": \"2023-04-15\", \"supplier\": \"Global Tobacco Ltd.\"}"
    },
    {
        "generated": "{\"patentId\":\"US98765432A\" ,\"applicationDate\":\"2021-07-15\",\"inventorNames\":[\"Dr. Alice Smith\",\"Dr. Bob Jones\"],\"currentStatus\":\"Pending Examination\"}",
        "expected": "{\"patentId\": \"US98765432A\", \"applicationDate\": \"2021-07-15\", \"inventorNames\": [\"Dr. Alice Smith\", \"Dr. Bob Jones\"], \"currentStatus\": \"Pending Examination\"}"
    }
]
{
    "elapsed_time": 69.35481475805864,
    "num_requests": 10,
    "total_num_tokens": 8086,
    "total_output_tokens": 5120,
    "requests_per_second": 0.14418609630614082,
    "tokens_per_second": "116.59",
    "output_tokens_per_second": "73.82"
}

Signed-off-by: Aaron Pham <[email protected]>

github-actions · 2024-11-22T00:10:30Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

aarnphm

tiny comment.

aarnphm · 2024-11-22T00:41:38Z

benchmarks/benchmark_guided.py

+from vllm.sampling_params import GuidedDecodingParams
+from vllm.utils import FlexibleArgumentParser, merge_async_iterators
+
+SCHEMA = {


hmm, should we move this schema out to a file and we can keep a few json schema to run benchmark against.

Moved to a separate file, also added 'grammar', 'choice', 'regex' to the benchmark

…tput

Signed-off-by: Chendi Xue <[email protected]>

aarnphm added 5 commits November 7, 2024 10:11

benchmark: add guided decoding script

f0b0c0d

Signed-off-by: Aaron Pham <[email protected]>

chore: add warmup args

c62b55b

Signed-off-by: Aaron Pham <[email protected]>

chore: run format accordingly

e64a701

Signed-off-by: Aaron Pham <[email protected]>

chore: add @mgoin's suggestion

a0c46f1

Signed-off-by: Aaron Pham <[email protected]>

chore: run format

91d9efc

Signed-off-by: Aaron Pham <[email protected]>

aarnphm mentioned this pull request Nov 22, 2024

[Benchmark] guided decoding #10046

Closed

aarnphm reviewed Nov 22, 2024

View reviewed changes

xuechendi added 4 commits November 22, 2024 00:55

Merge branch 'pr10046_structured_output' into benchmark_structured_ou…

8e67db0

…tput

Add xgrammar similiar dataset

18c245c

Signed-off-by: Chendi Xue <[email protected]>

Add grammar, regex, choice, json - json support using file path

9bec7fc

Signed-off-by: Chendi Xue <[email protected]>

Asycn engine save results

ad531ae

Signed-off-by: Chendi Xue <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Benchmark structured output with datasets #10557

[Benchmark] Benchmark structured output with datasets #10557

xuechendi commented Nov 22, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 22, 2024

aarnphm left a comment

aarnphm Nov 22, 2024

xuechendi Nov 22, 2024 •

edited

Loading

[Benchmark] Benchmark structured output with datasets #10557

Are you sure you want to change the base?

[Benchmark] Benchmark structured output with datasets #10557

Conversation

xuechendi commented Nov 22, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 22, 2024

aarnphm left a comment

Choose a reason for hiding this comment

aarnphm Nov 22, 2024

Choose a reason for hiding this comment

xuechendi Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

xuechendi commented Nov 22, 2024 •

edited by github-actions bot

Loading

xuechendi Nov 22, 2024 •

edited

Loading