Skip to content

Fix multi-turn generation #54

Fix multi-turn generation

Fix multi-turn generation #54

Re-run triggered June 15, 2024 13:19
Status Cancelled
Total duration 2m 16s
Artifacts 1

pytest-check.yml

on: pull_request
Matrix: subtest
Coverage
0s
Coverage
Fit to window
Zoom out
Zoom in

Annotations

34 errors and 4 warnings
subtest (4)
The job was canceled because "_3" failed.
subtest (4)
The operation was canceled.
subtest (4)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (6)
The job was canceled because "_3" failed.
subtest (6)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (6)
The operation was canceled.
subtest (2)
The job was canceled because "_3" failed.
subtest (2)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (2)
The operation was canceled.
subtest (5)
The job was canceled because "_3" failed.
subtest (5)
The operation was canceled.
subtest (5)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (8)
The job was canceled because "_3" failed.
subtest (8)
The operation was canceled.
subtest (8)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (7)
The job was canceled because "_3" failed.
subtest (7)
The operation was canceled.
subtest (7)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (9)
The job was canceled because "_3" failed.
subtest (9): tests/dry_test/test_datasets.py#L105
test_datasets_dry_run[drop-extra_args15] ValueError: Invalid dataset: drop. No module named 'scipy'
subtest (9): tests/dry_test/test_datasets.py#L105
test_datasets_dry_run[triviaqa-extra_args44] RuntimeError: ModuleNotFoundError: No module named 'scipy'
subtest (9): tests/utilization/dataset/test_formatting.py#L387
test_formatting[gsm8k-2-least_to_most-None] assert 'Answer the f...ket?\nAnswer:' == 'Answer the f...ket?\nAnswer:' Skipping 131 identical leading characters in diff, use -v to show r? - Answer: To answer the question "How many apples do they have together?", we need to know: "How many apples does Anna have?". ? - + Answer:To answer the question "How many apples do they have together?", we need to know: "How many apples does Anna have?". 1. Anna has 2 more apples than Elsa. So Anna has 2 + 5 = 7 apples. 2. Elsa and Anna have 5 + 7 = 12 apples together. So the answer is 12. Answer the following question. Question: If Pam is currently twice as young as Rena is, and in 10 years Rena will be 5 years older than her, how old is Pam now? - Answer: To answer the question "How old is Pam now?", we need to know: "How much older is Rena than Pam currently?". ? - + Answer:To answer the question "How old is Pam now?", we need to know: "How much older is Rena than Pam currently?". 1. Since Rena will be 5 years older than Pam in 10 years, she must be 5 years older than Pam now as well. 2. If Pam is currently twice as young as Rena, that means that Rena is currently twice as old as Pam is. So if P stands for Pam's age now and R stands for Rena's age now, then we know that R = 2 * P And since Rena is 5 years older than Pam now, we know that R = P + 5. By substitution, we have P + 5 = 2 * P, which means that P = 5. So the answer is 5. Answer the following question. Question: Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market? Answer:
subtest (9): tests/utilization/dataset/test_formatting.py#L387
test_formatting[mmlu:abstract_algebra-5-None-prob] AssertionError: assert 'The followin...D. 6\nAnswer:' == 'The followin...D. 6\nAnswer:' Skipping 168 identical leading characters in diff, use -v to show 3 - Answer: B ? - + Answer:B The following are multiple choice questions (with answers) about abstract algebra. Question: Statement 1 | If aH is an element of a factor group, then |aH| divides |a|. Statement 2 | If H and K are subgroups of G then HK is a subgroup of G. A. True, True B. False, False C. True, False D. False, True - Answer: B ? - + Answer:B The following are multiple choice questions (with answers) about abstract algebra. Question: Statement 1 | Every element of a group generates a cyclic subgroup of the group. Statement 2 | The symmetric group S_10 has 10 elements. A. True, True B. False, False C. True, False D. False, True - Answer: C ? - + Answer:C The following are multiple choice questions (with answers) about abstract algebra. Question: Statement 1| Every function from a finite set onto itself must be one to one. Statement 2 | Every subgroup of an abelian group is abelian. A. True, True B. False, False C. True, False D. False, True - Answer: A ? - + Answer:A The following are multiple choice questions (with answers) about abstract algebra. Question: Find the characteristic of the ring 2Z. A. 0 B. 3 C. 12 D. 30 - Answer: A ? - + Answer:A The following are multiple choice questions (with answers) about abstract algebra. Question: Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. A. 0 B. 4 C. 2 D. 6 Answer:
subtest (9): tests/utilization/model/test_apply_prompt_template.py#L67
test_smart_space AssertionError: assert 'This is a sy...tant message.' == 'This is a sy...tant message.' - This is a system message. This is a user message. This is an assistant message. This is the second user message. This is the second assistant message. ? - - - + This is a system message.This is a user message. This is an assistant message.This is the second user message.This is the second assistant message.
subtest (9): tests/utilization/model/test_to_model_prompt.py#L61
test_to_model_prompt[generation-False] AssertionError: assert 'This is a sy...user message.' == 'This is a sy...user message.' Skipping 40 identical leading characters in diff, use -v to show - r message. This is an assistant message. ? - + r message.This is an assistant message. This is the second user message.
subtest (9): tests/utilization/model/test_to_model_prompt.py#L61
test_to_model_prompt[get_ppl-False] AssertionError: assert ('This is a s...ant message.') == ('This is a s...ant message.') At index 0 diff: 'This is a system message.\n\nThis is a user message.This is an assistant message.\n\nThis is the second user message.' != 'This is a system message.\n\nThis is a user message. This is an assistant message.\n\nThis is the second user message.' Full diff: ( 'This is a system message.\n' '\n' - 'This is a user message. This is an assistant message.\n' ? - + 'This is a user message.This is an assistant message.\n' '\n' 'This is the second user message.', ' This is the second assistant message.', )
subtest (9)
The operation was canceled.
subtest (10)
The job was canceled because "_3" failed.
subtest (10)
The operation was canceled.
subtest (10)
No JUnit XML file was found. Set `fail-on-empty: false` if that is a valid use case
subtest (3): tests/dry_test/test_datasets.py#L105
test_datasets_dry_run[squad-extra_args40] RuntimeError: ModuleNotFoundError: No module named 'scipy'
subtest (3): tests/utilization/model/test_apply_prompt_template.py#L11
test_base AssertionError: assert 'This is a sy...tant message.' == 'This is a sy...tant message.' Skipping 40 identical leading characters in diff, use -v to show - r message. This is an assistant message. ? - + r message.This is an assistant message. - This is the second user message. This is the second assistant message. ? - + This is the second user message.This is the second assistant message.
subtest (3): tests/utilization/model/test_to_model_prompt.py#L61
test_to_model_prompt[get_prob-True] AssertionError: assert ('This is a s... message.', 1) == ('This is a s... message.', 1) At index 1 diff: 'This is a user message.This is an assistant message.\n\n' != 'This is a user message. This is an assistant message.\n\n' Full diff: ( 'This is a system message.\n' '\n', - 'This is a user message. This is an assistant message.\n' ? - + 'This is a user message.This is an assistant message.\n' '\n', 'This is the second user message.', ' This is the second assistant message.', 1, )
subtest (3)
Process completed with exit code 1.
subtest (1)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4, actions/upload-artifact@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
subtest (1)
The following actions uses node12 which is deprecated and will be forced to run on node16: actions/upload-artifact@v2. For more info: https://github.blog/changelog/2023-06-13-github-actions-all-actions-will-run-on-node16-instead-of-node12-by-default/
subtest (3)
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
Deprecation notice: v1, v2, and v3 of the artifact actions
The following artifacts were uploaded using a version of actions/upload-artifact that is scheduled for deprecation: "coverage1". Please update your workflow to use v4 of the artifact actions. Learn more: https://github.blog/changelog/2024-04-16-deprecation-notice-v3-of-the-artifact-actions/

Artifacts

Produced during runtime
Name Size
coverage1 Expired
68 KB