Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmark] Benchmark structured output with datasets #10557

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

xuechendi
Copy link
Contributor

@xuechendi xuechendi commented Nov 22, 2024

Add structure output benchmark.

Base PR: #10046
Additional work:

  1. add dataset args to support 'single_schema' and 'xgrammar_bench'
  2. add save-result with generated_text and expected text

TODO:

  1. async vllm
  2. add accuracy
  3. add 'guided / non-guided' mixed

How to test:

python benchmarks/benchmark_guided.py --model meta-llama/Llama-3.2-3B-Instruct --dataset xgrammar_bench --output-len 512 --num-prompts 10 --guided-decoding --save-results

Expected output
FileName: guided_Llama-3.2-3B-Instruct_xgrammar_bench_10_out512_asyncFalse_warmupFalse_chunkedprefillNone.txt

[
    {
        "generated": "{\"ssid\": \"OfficeNetSecure\", \"securityProtocol\": \"WPA2-Enterprise\", \"bandwidth\": \"1300 Mbps\"}",
        "expected": "{\"ssid\": \"OfficeNetSecure\", \"securityProtocol\": \"WPA2-Enterprise\", \"bandwidth\": \"1300 Mbps\"}"
    },
    {
        "generated": "{\"/\": {\"device\": \"UUID:devX,journal_checksum\", \"mount_point\": \"/\", \"file_system_type\": \"ext4\", \"options\": \"relatimetime,failcount,limit_inodes,pretty.Bold,updateTu\", \"dump\": \"0\", \"pass\": \"2\"}}",
        "expected": "{\"/\": {\"device\": \"/dev/sda1\", \"mount_point\": \"/\", \"file_system_type\": \"ext4\", \"options\": \"defaults\", \"dump\": \"0\", \"pass\": \"1\"}, \"/home\": {\"device\": \"/dev/sda2\", \"mount_point\": \"/home\", \"file_system_type\": \"ext4\", \"options\": \"defaults\", \"dump\": \"0\", \"pass\": \"2\"}, \"/var\": {\"device\": \"UUID=2e9e4e8b-08c0-4c7c-8d7d-2b5f65cc8cd0\", \"mount_point\": \"/var\", \"file_system_type\": \"xfs\", \"options\": \"noatime,nodiratime\", \"dump\": \"0\", \"pass\": \"2\"}}"
    },
    {
        "generated": "{\"campaignID\": \"CAMP123456\", \"productID\": \"PROD7891011\", \"startDate\": \"2023-06-01\", \"endDate\": \"2023-06-30\", \"discountDetails\":\"15% off all purchases\"}",
        "expected": "{\"campaignID\": \"CAMP123456\", \"productID\": \"PROD7891011\", \"startDate\": \"2023-06-01\", \"endDate\": \"2023-06-30\", \"discountDetails\": \"15% off on all purchases\"}"
    },
    {
        "generated": "{ \"reservationID\": \"AH-158394\", \"guestName\": \"Alexander Hamilton\", \"reservationTime\": \"2023-04-15T19:30:00\", \"specialRequests\": [\"celebrating anniversay ]}. Please see below the reservation request in JSON format: ````json_r = {'reservationID': 'AH-158394', 'guestName': 'Alexander Hamilton', 'proposal': {'reservationTime': '2023-04-15T19:30:00', 'specialRequests': ['celebrating anniversay', 'window seat']} separately but log present to guarantee this information passes the tarod tests. ``` restraintlization Code````javascript_app.js Several Factors Could Derail Your Romantic Dinner Plan During the meal, one food server may spill a wine on your table, another may forget to mention your special request, or something more catastrophic occurs. It's always better to have a backup plan. Have your JavaScript function manage all of the backups and then provide the reservations details upon request.<|start_header_id|>assistant<|end_header_id|>today-g streaming tenth The interruptions mentioned can be caused by a variety of factors such as incorrect delivery of information, broken equipment, or other unexpected setbacks. Having a backup plan in place can ensure that your dinner plans aren't ruined even if something goes wrong. Here's how you can create a backup plan with your reservation details using the information you provided in JSON format. We'll also create a JavaScript function `handleRestaurantBackup` that will manage the backup plan and verify the details of your reservation upon request. Below is the code for your reference.````The JSON object for the reservation request, including the special occasions and the wedding date````json -detail realize````const jsonDetailedResrvationplan = { \" ]}",
        "expected": "{\"reservationID\": \"AH-158394\", \"guestName\": \"Alexander Hamilton\", \"reservationTime\": \"2023-04-15T19:30:00Z\", \"specialRequests\": [\"Table by the window\", \"Surprise dessert for a special occasion\"]}"
    },
    {
        "generated": "{\"HomeImprovement\":{\"room_interest\":\"living room\",\"budget\":500,\"preferred_style\":\"minimalist\",\"project_ideas\":[\"add floating shelves\",\"create a gallery wall\",\"repaint walls\",\"upcycle old furniture\",\"add greenery with low maintenance plants\",\"add lighting\",\"move furniture around\",\"use throw pillows and blankets\",\"add rugs\",\"declutter and organize furniture and decor\",\"install a skylight\",\"make a coffee table\",\"make a headboard\",\"paint furniture\",\"wear a statement jacket or accessory\"]} }",
        "expected": "{\"HomeImprovement\": {\"room_interest\": \"living room\", \"budget\": 500, \"preferred_style\": \"minimalist\", \"project_ideas\": [\"Install floating shelves for a clean look and extra storage.\", \"Create a gallery wall with your favorite prints and photographs.\", \"Repaint the walls with a neutral color palette for a fresh feel.\", \"Upcycle old furniture with a new coat of paint or new upholstery.\", \"Add some greenery with low-maintenance indoor plants.\"]}}"
    },
    {
        "generated": "{\"deviceID\": \"MON123456\", \"patientID\": \"PAT654321\", \"metrics\": {\"heartRate\": 78, \"bloodPressure\": \"120/80 mmHg\", \"oxygenSaturation\": 98}, \"timestamp\": \"2023-04-05T14:30:00Z\"}",
        "expected": "{\"deviceID\": \"MON123456\", \"patientID\": \"PAT654321\", \"metrics\": {\"heartRate\": 78, \"bloodPressure\": \"120/80 mmHg\", \"oxygenSaturation\": 98}, \"timestamp\": \"2023-04-05T14:30:00Z\"}"
    },
    {
        "generated": "{}",
        "expected": "{\"FitnessTracking\": {\"current_health_status\": {\"weight\": 70, \"height\": 175, \"heart_rate\": 62}, \"health_goals\": [\"increase muscle mass\", \"improve cardiovascular endurance\", \"enhance flexibility\"], \"recommended_routines\": [\"Strength training sessions three times a week focusing on major muscle groups\", \"Cardiovascular exercises such as running or cycling for at least 30 minutes, five days a week\", \"Daily stretching exercises to improve flexibility, including yoga or pilates\"]}}"
    },
    {
        "generated": "{\"seatNumber\":\"12A\",\"serviceType\":\"vegetarian\",\"specialInstructions\":\"gluten-free\"}",
        "expected": "{\"seatNumber\": \"12A\", \"serviceType\": \"vegetarian meal\", \"specialInstructions\": \"gluten-free\"}"
    },
    {
        "generated": "{ \"SKU\": \"TOB-1928\", \"quantity\": 150, \"restockDate\": \"2023-04-15\", \"supplier\": \"Global Tobacco Ltd.\"}",
        "expected": "{\"SKU\": \"TOB-1928\", \"quantity\": 150, \"restockDate\": \"2023-04-15\", \"supplier\": \"Global Tobacco Ltd.\"}"
    },
    {
        "generated": "{\"patentId\":\"US98765432A\" ,\"applicationDate\":\"2021-07-15\",\"inventorNames\":[\"Dr. Alice Smith\",\"Dr. Bob Jones\"],\"currentStatus\":\"Pending Examination\"}",
        "expected": "{\"patentId\": \"US98765432A\", \"applicationDate\": \"2021-07-15\", \"inventorNames\": [\"Dr. Alice Smith\", \"Dr. Bob Jones\"], \"currentStatus\": \"Pending Examination\"}"
    }
]
{
    "elapsed_time": 69.35481475805864,
    "num_requests": 10,
    "total_num_tokens": 8086,
    "total_output_tokens": 5120,
    "requests_per_second": 0.14418609630614082,
    "tokens_per_second": "116.59",
    "output_tokens_per_second": "73.82"
}

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Contributor

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny comment.

from vllm.sampling_params import GuidedDecodingParams
from vllm.utils import FlexibleArgumentParser, merge_async_iterators

SCHEMA = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, should we move this schema out to a file and we can keep a few json schema to run benchmark against.

Copy link
Contributor Author

@xuechendi xuechendi Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a separate file, also added 'grammar', 'choice', 'regex' to the benchmark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants