Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish4 temp #11

Open
wants to merge 257 commits into
base: polish3
Choose a base branch
from
Open

Polish4 temp #11

wants to merge 257 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Feb 27, 2024

  1. fixes

    djstrong committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    1cb35cc View commit details
    Browse the repository at this point in the history
  2. fix: change the cbd_mc to be CATEGORIES-based

    Restored default case for cbd_regex
    Fixed typo in klej_ner_mc
    kacpermilan authored and djstrong committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    55f274b View commit details
    Browse the repository at this point in the history
  3. fix: typo in cbd_mc.yaml

    kacpermilan authored and djstrong committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    35af374 View commit details
    Browse the repository at this point in the history
  4. fix: typo in cbd_mc.yaml

    kacpermilan authored and djstrong committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    9540f16 View commit details
    Browse the repository at this point in the history
  5. update polish groups

    djstrong committed Feb 27, 2024
    Configuration menu
    Copy the full SHA
    d3d7d01 View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2024

  1. Configuration menu
    Copy the full SHA
    c4679ce View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2024

  1. fix stderr aggregation

    djstrong committed Mar 5, 2024
    Configuration menu
    Copy the full SHA
    7fc327d View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2024

  1. add perplexity task

    djstrong committed Mar 10, 2024
    Configuration menu
    Copy the full SHA
    e14e593 View commit details
    Browse the repository at this point in the history
  2. belebele mc

    djstrong committed Mar 10, 2024
    Configuration menu
    Copy the full SHA
    bc61568 View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2024

  1. Configuration menu
    Copy the full SHA
    85eb77f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0632a05 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    bb879de View commit details
    Browse the repository at this point in the history
  4. Fix Issue regarding stderr (EleutherAI#1327)

    * add fix fordeciding if stderr is N/A or not
    
    * process N/A
    lintangsutawika authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6414edd View commit details
    Browse the repository at this point in the history
  5. Add local-completions support using OpenAI interface (EleutherAI#1277)

    * Add `local-completions` support using OpenAI interface
    
    * Refactor oa_completion
    
    * Address tokenizer comments and change request chunks to batch size
    
    * Add warning message for tiktoken backend
    
    * fix formatting
    
    * fix whitespace
    
    * Update README.md
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    66783f6 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f0ba560 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    9dd448b View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4f263af View commit details
    Browse the repository at this point in the history
  9. Update migrated HF dataset paths (EleutherAI#1332)

    * Update arc_easy.yaml
    
    * Update flan_cot.yaml
    
    * update HF dataset path
    
    * Update freeform.yaml
    
    * Update flan_cot.yaml
    
    ---------
    
    Co-authored-by: Lintang Sutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    0d8d549 View commit details
    Browse the repository at this point in the history
  10. Don't use get_task_dict() in task registration / initialization (El…

    …eutherAI#1331)
    
    * don't use get_task_dict() as a helper, it will download the dataset!
    
    * pre-commit
    
    * Update README.md
    
    ---------
    
    Co-authored-by: lintangsutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    268d252 View commit details
    Browse the repository at this point in the history
  11. manage default (greedy) gen_kwargs in vllm (EleutherAI#1341)

    * manage default (greedy) gen_kwargs in vllm better
    
    * mirror HF `do_sample`
    
    * just need to set temp=0 for greedy
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    82e319d View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    0938c13 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    97361ed View commit details
    Browse the repository at this point in the history
  14. Filter docs not offset by doc_id (EleutherAI#1349)

    * get `doc` from instance
    
    * acceletate bugfix: get ground doc from instance
    
    * convert filter to `process_result`
    
    * get docs from instances in `FilterEnsemble`
    
    * rename
    
    * nit
    
    * better looping
    
    * fix typehint
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ca3a895 View commit details
    Browse the repository at this point in the history
  15. Add FAQ on lm_eval.tasks.initialize_tasks() to README (EleutherAI#1330

    )
    
    * Update README.md
    
    * [!Tip]
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2eeaf15 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    d467d2f View commit details
    Browse the repository at this point in the history
  17. Add causalLM OpenVino models (EleutherAI#1290)

    * added intel optimum
    
    * added intel optimum in readme
    
    * modified intel optimum
    
    * modified intel optimum
    
    * modified intel optimum
    
    * modified install optimum
    
    * modified path of IR file
    
    * added openvino_device
    
    * added openvino_device2
    
    * changed optimum-causal to openvino-causal
    
    * Update README.md
    
    * Update README.md
    
    * remove `lm_eval.base` import
    
    * update openvino-causal -> openvino ; pass device through super().__init__()
    
    * Update README.md
    
    * Add optimum to tests dependencies
    
    * apply pre-commit
    
    * fix so tests pass
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f41ac12 View commit details
    Browse the repository at this point in the history
  18. Apply some best practices and guideline recommendations to code (Eleu…

    …therAI#1363)
    
    * raise Exception, not a string
    
    Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes
    https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions
    
    * Apply PEP8 recommendation to prefer isinstance
    
    "Object type comparisons should always use isinstance() instead of comparing types directly"
    https://peps.python.org/pep-0008/
    
    * Remove dangerous default mutable values in arguments
    
    https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html
    
    * Format logging messages with fstring (not with format)
    
    Additional info
    https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html
    There are also discussions about the speed of formatting while logging or some unintended code executions
    pylint-dev/pylint#2395
    https://stackoverflow.com/a/54368109
    but at least one format (fstring one) will be used throughout the project
    
    * Specify utf-8 encoding for `open` explicitly
    
    If not specified, it may be supposed differently in different environments, OSes, and Python versions. See
    https://peps.python.org/pep-0597/
    https://docs.python.org/3.11/library/locale.html#locale.getencoding
    https://docs.python.org/3.10/library/os.html#utf8-mode
    https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html
    
    Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages.
    
    * Use inline-ignoring comments to pass pre-commit instead of identity process
    
    https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors
    https://www.flake8rules.com/rules/F841.html
    
    flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression
    LSinev authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    154f5fa View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    b43d9d9 View commit details
    Browse the repository at this point in the history
  20. delay filter init; remove *args (EleutherAI#1369)

    * delay filter init; remove `*args`
    
    * bugfix
    
    * optimize
    
    * type hint
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2b31cfb View commit details
    Browse the repository at this point in the history
  21. Fix unintuitive --gen_kwargs behavior (EleutherAI#1329)

    * don't override do_sample if no value for it is passed
    
    * Update gen_kwargs override condition
    
    * Update huggingface.py
    
    * Update huggingface.py
    
    * run linters
    
    * silence an erroneous warning
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    cdc41c4 View commit details
    Browse the repository at this point in the history
  22. Publish to pypi (EleutherAI#1194)

    * publish to pypi
    
    * lint
    
    * Update publish.yml
    
    * minor
    anjor authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b39e8da View commit details
    Browse the repository at this point in the history
  23. Make dependencies compatible with PyPI (EleutherAI#1378)

    * make deps not point to github urls
    
    * formatting
    
    * try making PyPI only run on tag pushes
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    0a39c84 View commit details
    Browse the repository at this point in the history
  24. Add support for RWKV models with World tokenizer (EleutherAI#1374)

    * Add support for RWKV models with World tokenizer
    
    The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0
    
    This however fails all the "if set" checks, and would cause the tokenizer to crash.
    
    A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers
    
    * Update huggingface.py
    
    Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes.
    
    * Comply with formatting guidelines
    
    * fix format
    
    ---------
    
    Co-authored-by: Stella Biderman <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b7513d3 View commit details
    Browse the repository at this point in the history
  25. add bypass metric (EleutherAI#1156)

    * add bypass metric
    
    * fixed `bypass` metric.
    
    * add task attributes if predict_only
    
    * add `predict_only` checks
    
    * add docs
    
    * added `overide_metric`, `override_config` to `Task`
    
    * nits
    
    * nit
    
    * changed --predict_only to generations; nits
    
    * nits
    
    * nits
    
    * change gen_kwargs warning
    
    * add note about `--predict_only` in README.md
    
    * added `predict_only`
    
    * move table to bottom
    
    * nit
    
    * change null aggregation to bypass (conflict)
    
    * bugfix; default `temp=0.0`
    
    * typo
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    7d068d2 View commit details
    Browse the repository at this point in the history
  26. Expand docs, update CITATION.bib (EleutherAI#1227)

    * Update CITATION.bib
    
    * Create CONTRIBUTING.md
    
    * add disclaimer re: multi node
    
    * flesh out some sections more
    
    * Flesh out contributor guide
    
    * revert CITATION.bib
    
    * appease pre-commit
    
    ---------
    
    Co-authored-by: lintangsutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b284735 View commit details
    Browse the repository at this point in the history
  27. Hf: minor egde cases (EleutherAI#1380)

    * edge cases where variable might not be assigned.
    
    * type hint
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    80c158c View commit details
    Browse the repository at this point in the history
  28. Enable override of printed n-shot in table (EleutherAI#1379)

    * allow tasks to specify printed fewshot val
    
    * fix to belebele
    
    * update metadata field's documentation
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d55e918 View commit details
    Browse the repository at this point in the history
  29. Faster Task and Group Loading, Allow Recursive Groups (EleutherAI#1321)

    * add trust_remote_code as default
    
    * task for testing recursive
    
    * changed source of ALL_TASKS
    
    * tasks should only accept TaskObjects
    
    * initialize_tasks returns list of tasks and groups
    
    * remove trust_remote_code for now
    
    * moved constructor process to inside load_yaml_config
    
    * more comprehensive way to index tasks and groups
    
    * pre-commit format
    
    * add exit after error
    
    * adjust how task objects are called
    
    * no need to use get_task_dict
    
    * load_task_or_group works but only for tasks
    
    * pre-commit format
    
    * half working for nested groups
    
    * changed variable names
    
    * allow groups and tasks to work
    
    * temp save
    
    * indexing and loading are part of a task_manager object
    
    * adapted initialize_tasks
    
    * iron out bugs
    
    * fixed typo
    
    * fixed typo
    
    * simplified code
    
    * further tidy up
    
    * remove lines for testing
    
    * removed test lines
    
    * removed unused code
    
    * remove unused import
    
    * fixed bug
    
    * removed comments
    
    * group in a list of group can accept parameter changes like `num_fewshot`
    
    * add trust_remote_code as default
    
    * task for testing recursive
    
    * changed source of ALL_TASKS
    
    * tasks should only accept TaskObjects
    
    * initialize_tasks returns list of tasks and groups
    
    * remove trust_remote_code for now
    
    * moved constructor process to inside load_yaml_config
    
    * more comprehensive way to index tasks and groups
    
    * pre-commit format
    
    * add exit after error
    
    * adjust how task objects are called
    
    * no need to use get_task_dict
    
    * load_task_or_group works but only for tasks
    
    * pre-commit format
    
    * half working for nested groups
    
    * changed variable names
    
    * allow groups and tasks to work
    
    * temp save
    
    * indexing and loading are part of a task_manager object
    
    * adapted initialize_tasks
    
    * iron out bugs
    
    * fixed typo
    
    * fixed typo
    
    * simplified code
    
    * further tidy up
    
    * remove lines for testing
    
    * removed test lines
    
    * removed unused code
    
    * remove unused import
    
    * fixed bug
    
    * removed comments
    
    * group in a list of group can accept parameter changes like `num_fewshot`
    
    * check if config is task update
    
    * add GroupConfig object
    
    * edit test yaml
    
    * remove args
    
    * testing returning to python task list
    
    * add weight_by_size config
    
    * describe weight_by_size in docs
    
    * fix weight by size potential error
    
    * can load individual custom python class task
    
    * moved import_function into the config loading file
    
    * remove print lines
    
    * add squadv2 yaml
    
    * temporary scroll implementation
    
    * revert back to use load_yaml_config but with modes
    
    * fix group being loaded with a None
    
    * reformat
    
    * can load unregistered tasks from a group
    
    * update scrolls
    
    * edit scrolls multiplechoice task
    
    * adjust class initialization
    
    * fix initialization
    
    * changed how to identify group and python tasks, fix logger
    
    * allow loading "include" that is nested in a group config
    
    * reworked flan benchmark
    
    * allow duplicate task in the same group to co-exist
    
    * process group_alias
    
    * removed group_alias
    
    * allow parameters set in group_config to apply to all tasks in tasklist
    
    * add function, but comment for now
    
    * reworked processing dict-base config
    
    * fixed how configs in group are processed
    
    * update to allow root group to have its alias used
    
    * remove unused classes
    
    * remove unused classes
    
    * revert some parts to original
    
    * forgot to change one variable
    
    * adapt the new process to use get_task_dict
    
    * fix for singular group call
    
    * fix variable names
    
    * add TaskManager into the evaluator
    
    * format
    
    * changed how dict tasks are loaded
    
    * add docs
    
    * Update docs/new_task_guide.md
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update evaluator.py
    
    * Update evaluator.py
    
    * remove groupconfig for now
    
    * changed _config to config
    
    * update interface.md to explain TaskManager
    
    * added property functions
    
    * adjusted logger
    
    * update write_out.py
    
    * updated tests
    
    * added documentation and some modifications
    
    * added docstring documentation
    
    * precommit format
    
    * updated task loading for tests
    
    * updates tests
    
    * changed arg order for load_yaml_config
    
    * update to handle scrolls and edit log message
    
    * remove unused lines
    
    * return a list of task classes and not a dict
    
    * Update __init__.py
    
    * Delete lm_eval/tasks/benchmarks/test.yaml
    
    * Update task.py
    
    * Update lm_eval/utils.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update lm_eval/utils.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update utils.py
    
    * re-added old functions with new log message
    
    * Update docs/new_task_guide.md
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update new_task_guide.md
    
    * added infor regarding `get_task_dict` and documentation
    
    * add get_config for Task
    
    * pre-commit formatting
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d6b65f1 View commit details
    Browse the repository at this point in the history
  30. Fix for EleutherAI#1383 (EleutherAI#1384)

    Fixes EleutherAI#1383
    
    If this is okay, it will need to be propagated to SCROLLS
    pminervini authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    5810eac View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    bad70e7 View commit details
    Browse the repository at this point in the history
  32. Support for Inf2 optimum class [WIP] (EleutherAI#1364)

    * initial commit
    
    * remove overwrite bs
    
    * adding neuronx dependencies
    
    * Update README.md
    
    * update neuronx
    michaelfeil authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    09ca8ff View commit details
    Browse the repository at this point in the history
  33. Update README.md (EleutherAI#1398)

    Add instructions for non-MacOS users on how to compile janitor_util.cpp so that janitor.py can use it.
    mycoalchen authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    590bcc7 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    4ed48ca View commit details
    Browse the repository at this point in the history
  35. Use Pooled rather than Combined Variance for calculating stderr of ta…

    …sk groupings (EleutherAI#1390)
    
    * update formula for stderr aggregation
    
    * hack: see what happens when using stderr_for_metric bootstrapping on a group
    
    * undo bootstrap_for_stderr test
    
    * factor out variance-aggregation formulas into api.metrics
    
    * fix failing tests
    
    * remove stray print
    
    * update comment
    
    * further detail in comment
    
    * add back initialize_tasks() call
    
    * fix format
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    77b79a0 View commit details
    Browse the repository at this point in the history
  36. adding hf_transfer (EleutherAI#1400)

    * add hf_transfer
    
    * update dependencies
    
    * Delete stale `[linting]` extra
    
    * Update README.md with extras table
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ca8c608 View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    79378a8 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    a04bf2b View commit details
    Browse the repository at this point in the history
  39. Fixes EleutherAI#1416 (EleutherAI#1418)

    * Fixes EleutherAI#1416
    
    Sets `do_sample = False` if `temperature == 0.0` and `do_sample = None`
    
    * Update huggingface.py
    
    * Update huggingface.py
    
    making linter happy
    pminervini authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f2220c7 View commit details
    Browse the repository at this point in the history
  40. Fix watchdog timeout (EleutherAI#1404)

    * Fix watchdog timeout
    
    * Pre-commit fix
    
    * Timedelta
    JeevanBhoot authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    5b0db7a View commit details
    Browse the repository at this point in the history
  41. Evaluate (EleutherAI#1385)

    * un-exclude `evaluate.py` from linting
    
    * readability
    
    * readability
    
    * add task name to build info message
    
    * fix link
    
    * nit
    
    * add functions for var and mean pooling
    
    * add functions for var and mean pooling
    
    * metadata compatibility with task
    
    * rename `override_config` to `set_config` and move to `Task`
    
    * add unit test
    
    * nit
    
    * nit
    
    * bugfix
    
    * nit
    
    * nit
    
    * nit
    
    * add docstrings
    
    * fix metadata-fewshot
    
    * revert metric refactor
    
    * nit
    
    * type checking
    
    * type hints
    
    * type hints
    
    * move `override_metric` to `Task`
    
    * change metadata
    
    * change name
    
    * pre-commit
    
    * rename
    
    * remove
    
    * remove
    
    * `override_metric` backwards compatible with `Task`
    
    * type hints
    
    * use generic
    
    * type hint
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8d82b49 View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    80e0a4f View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    5c1b249 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    66e9620 View commit details
    Browse the repository at this point in the history
  45. Added seeds to evaluator.simple_evaluate signature (EleutherAI#1412)

    * Added seeds to `evaluator.simple_evaluate` signature
    
    * Added  CLI argument
    
    * Updated  to add  arg.
    Am1n3e authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c2c361c View commit details
    Browse the repository at this point in the history
  46. Fix: task weighting by subtask size ; update Pooled Stderr formula sl…

    …ightly (EleutherAI#1427)
    
    * fix weight_by_size condition
    
    * add tests, update stderr formula slightly
    
    * apply pre-commit
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    af3ca77 View commit details
    Browse the repository at this point in the history
  47. Configuration menu
    Copy the full SHA
    205c870 View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    71bbba4 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    d027702 View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    8315c1f View commit details
    Browse the repository at this point in the history
  51. update bbh, gsm8k, mmlu parsing logic and prompts (Orca2 bbh_cot_zero…

    …shot 0% -> 42%) (EleutherAI#1356)
    
    * update bbh, gsm8k, mmlu parsing logic and prompts
    
    * remove the formatting prompt (bbh) + minor update (mmlu)
    
    * update bbh, gsm8k, mmlu zeroshot, revert fewshots
    
    * update bbh, gsm8k, mmlu version, forward changes to gsm8k-cot
    
    * remove take_last, update to use docs parameters
    
    * add newline
    
    * ruff formatting
    
    * Update pyproject.toml
    
    * fix format
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ba89cd6 View commit details
    Browse the repository at this point in the history
  52. Add a new task HaeRae-Bench (EleutherAI#1445)

    * haerae_reimplementation
    
    * edited Readme and add few_shot settings
    
    * edited readme
    
    * newlines at end of each files
    
    * Modifying the README file
    
    * applied pre-commit
    h-albert-lee authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f3e993d View commit details
    Browse the repository at this point in the history
  53. Group reqs by context (EleutherAI#1425)

    * add key lookup for same contexts
    
    * nit
    
    * appease pre-commit
    
    * nit
    
    * use `expand` (in-place view) rather than `repeat`
    
    * try mixed grouping
    
    * add docs.
    
    * nit
    
    * nit
    
    * nits
    
    * fix tests
    
    * Move greedy_tokens calculation out of cache loop
    
    * nit
    
    * nits
    
    * add test
    
    * nits
    
    * fix name conflict
    
    * fix name conflict
    
    * chunk tensor
    
    * move Collator
    
    * nits/docstring
    
    * fixup
    
    * fixup
    
    * group contexts only for decoders
    
    * pre-commit
    
    * fix `generate_until` test
    
    * fix `generate_until` test
    
    * Update lm_eval/models/huggingface.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * add docs
    
    * nit
    
    * add docs
    
    * add docs
    
    * add 'logits_cache' arg
    
    * bugfix
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    44254b3 View commit details
    Browse the repository at this point in the history
  54. Add a new task GPQA (the part without CoT) (EleutherAI#1434)

    * add new task GPQA_n_shot
    
    * add new task GPQA_zeroshot
    
    * correct GPQA_zeroshot filename
    
    * Add randomly shuffle choices
    
    * Correct missing parentheses
    
    * delete wrong tasks
    
    * Add README
    
    * Update lm_eval/tasks/gpqa/zeroshot/_gpqa_zeroshot_yaml
    
    * Update lm_eval/tasks/gpqa/n_shot/utils.py
    
    * Update lm_eval/tasks/gpqa/n_shot/utils.py
    
    * Update lm_eval/tasks/gpqa/README.md
    
    * placate linter
    
    * linter
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c51d0ce View commit details
    Browse the repository at this point in the history
  55. Added KMMLU evaluation method and changed ReadMe (EleutherAI#1447)

    * update kmmlu default formatting
    
    * Update _default_kmmlu_yaml
    
    * Delete lm_eval/tasks/kmmlu/utils.py
    
    * new tasks implemented
    
    * add direct tasks
    
    * update direct evaluate
    
    * update direct eval
    
    * add cot sample
    
    * update cot
    
    * add cot
    
    * Update _cot_kmmlu_yaml
    
    * add kmmlu90
    
    * Update and rename _cot_kmmlu.yaml to _cot_kmmlu_yaml
    
    * Create kmmlu90.yaml
    
    * Update _cot_kmmlu_yaml
    
    * add direct
    
    * Update _cot_kmmlu_yaml
    
    * Update and rename kmmlu90.yaml to kmmlu90_cot.yaml
    
    * Update kmmlu90_direct.yaml
    
    * add kmmlu hard
    
    * Update _cot_kmmlu_yaml
    
    * Update _cot_kmmlu_yaml
    
    * update cot
    
    * update cot
    
    * erase typo
    
    * Update _cot_kmmlu_yaml
    
    * update cot
    
    * Rename dataset to match k-mmlu-hard
    
    * removed kmmlu90
    
    * fixed name 'kmmlu_cot' to 'kmmlu_hard_cot' and revised README
    
    * applied pre-commit before pull requests
    
    * rename datasets and add notes
    
    * Remove DS_Store cache
    
    * Update lm_eval/tasks/kmmlu/README.md
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Change citations and reflect reviews on version
    
    * Added kmmlu_hard and fixed other errors
    
    * fixing minor errors
    
    * remove duplicated
    
    * Rename files
    
    * try ".index"
    
    * minor fix
    
    * minor fix again
    
    * fix revert.
    
    * minor fix. thank for hailey
    
    ---------
    
    Co-authored-by: GUIJIN SON <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    7dc04ed View commit details
    Browse the repository at this point in the history
  56. Add TemplateLM boilerplate LM class (EleutherAI#1279)

    * loglikelihood refactor using template lm
    
    * linter
    
    * fix whitespace in target + prompt for CoT gsm8k (EleutherAI#1275)
    
    * Make `parallelize=True` vs. `accelerate launch` distinction clearer in docs (EleutherAI#1261)
    
    * Make parallelize=True distinction clearer in documentation.
    
    * run linter
    
    * Allow parameter edits for registered tasks when listed in a benchmark (EleutherAI#1273)
    
    * benchmark yamls allow minor edits of already registered tasks
    
    * add documentation
    
    * removed print
    
    * Fix data-parallel evaluation with quantized models (EleutherAI#1270)
    
    * add WIP device_map overrides
    
    * update handling outside of accelerate launcher
    
    * change .to(device) log to debug level
    
    * run linter
    
    * Rework documentation for explaining local dataset (EleutherAI#1284)
    
    * rewor documentation for explaining local dataset
    
    * fix typo
    
    * Update new_task_guide.md
    
    * Re-add citation
    
    It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.
    
    * Update CITATION.bib (EleutherAI#1285)
    
    Bumping CITATION.bib to match re-adding the citation in readme. 
    
    cc @StellaAthena
    
    * Update nq_open.yaml (EleutherAI#1289)
    
    * Update README.md with custom integration doc (EleutherAI#1298)
    
    * Update README.md
    
    * punctuation
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update nq_open.yaml (EleutherAI#1305)
    
    * Update nq_open.yaml
    
    change regex
    
    * Bump NQ version
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update task_guide.md (EleutherAI#1306)
    
    * Update pyproject.toml (EleutherAI#1312)
    
    * Fix polemo2_in.yaml config name (EleutherAI#1313)
    
    * Update pyproject.toml (EleutherAI#1314)
    
    * Fix group register (EleutherAI#1315)
    
    * tuple should be considered as well
    
    * set option to keep callable as callable
    
    * Update task_guide.md (EleutherAI#1316)
    
    * Update polemo2_in.yaml (EleutherAI#1318)
    
    * don't pass extra kwargs to mamba any more (EleutherAI#1328)
    
    * Fix Issue regarding stderr (EleutherAI#1327)
    
    * add fix fordeciding if stderr is N/A or not
    
    * process N/A
    
    * Add `local-completions` support using OpenAI interface (EleutherAI#1277)
    
    * Add `local-completions` support using OpenAI interface
    
    * Refactor oa_completion
    
    * Address tokenizer comments and change request chunks to batch size
    
    * Add warning message for tiktoken backend
    
    * fix formatting
    
    * fix whitespace
    
    * Update README.md
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * fallback to classname when LM doesnt have config (EleutherAI#1334)
    
    * fix a trailing whitespace that breaks a lint job (EleutherAI#1335)
    
    * skip "benchmarks" in changed_tasks (EleutherAI#1336)
    
    * Update migrated HF dataset paths (EleutherAI#1332)
    
    * Update arc_easy.yaml
    
    * Update flan_cot.yaml
    
    * update HF dataset path
    
    * Update freeform.yaml
    
    * Update flan_cot.yaml
    
    ---------
    
    Co-authored-by: Lintang Sutawika <[email protected]>
    
    * Don't use `get_task_dict()` in task registration / initialization (EleutherAI#1331)
    
    * don't use get_task_dict() as a helper, it will download the dataset!
    
    * pre-commit
    
    * Update README.md
    
    ---------
    
    Co-authored-by: lintangsutawika <[email protected]>
    
    * manage default (greedy) gen_kwargs in vllm (EleutherAI#1341)
    
    * manage default (greedy) gen_kwargs in vllm better
    
    * mirror HF `do_sample`
    
    * just need to set temp=0 for greedy
    
    * modified default gen_kwargs to work better with CLI; changed prompt_logprobs=1 (EleutherAI#1345)
    
    * update links to task_guide.md (EleutherAI#1348)
    
    * `Filter` docs not offset by `doc_id`  (EleutherAI#1349)
    
    * get `doc` from instance
    
    * acceletate bugfix: get ground doc from instance
    
    * convert filter to `process_result`
    
    * get docs from instances in `FilterEnsemble`
    
    * rename
    
    * nit
    
    * better looping
    
    * fix typehint
    
    * Add FAQ on `lm_eval.tasks.initialize_tasks()` to README (EleutherAI#1330)
    
    * Update README.md
    
    * [!Tip]
    
    * Refix issue regarding stderr (EleutherAI#1357)
    
    * Add causalLM OpenVino models (EleutherAI#1290)
    
    * added intel optimum
    
    * added intel optimum in readme
    
    * modified intel optimum
    
    * modified intel optimum
    
    * modified intel optimum
    
    * modified install optimum
    
    * modified path of IR file
    
    * added openvino_device
    
    * added openvino_device2
    
    * changed optimum-causal to openvino-causal
    
    * Update README.md
    
    * Update README.md
    
    * remove `lm_eval.base` import
    
    * update openvino-causal -> openvino ; pass device through super().__init__()
    
    * Update README.md
    
    * Add optimum to tests dependencies
    
    * apply pre-commit
    
    * fix so tests pass
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    
    * Apply some best practices and guideline recommendations to code (EleutherAI#1363)
    
    * raise Exception, not a string
    
    Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes
    https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions
    
    * Apply PEP8 recommendation to prefer isinstance
    
    "Object type comparisons should always use isinstance() instead of comparing types directly"
    https://peps.python.org/pep-0008/
    
    * Remove dangerous default mutable values in arguments
    
    https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html
    
    * Format logging messages with fstring (not with format)
    
    Additional info
    https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html
    There are also discussions about the speed of formatting while logging or some unintended code executions
    pylint-dev/pylint#2395
    https://stackoverflow.com/a/54368109
    but at least one format (fstring one) will be used throughout the project
    
    * Specify utf-8 encoding for `open` explicitly
    
    If not specified, it may be supposed differently in different environments, OSes, and Python versions. See
    https://peps.python.org/pep-0597/
    https://docs.python.org/3.11/library/locale.html#locale.getencoding
    https://docs.python.org/3.10/library/os.html#utf8-mode
    https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html
    
    Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages.
    
    * Use inline-ignoring comments to pass pre-commit instead of identity process
    
    https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors
    https://www.flake8rules.com/rules/F841.html
    
    flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression
    
    * serialize callable functions in config (EleutherAI#1367)
    
    * delay filter init; remove `*args` (EleutherAI#1369)
    
    * delay filter init; remove `*args`
    
    * bugfix
    
    * optimize
    
    * type hint
    
    * Fix unintuitive `--gen_kwargs` behavior (EleutherAI#1329)
    
    * don't override do_sample if no value for it is passed
    
    * Update gen_kwargs override condition
    
    * Update huggingface.py
    
    * Update huggingface.py
    
    * run linters
    
    * silence an erroneous warning
    
    * Publish to pypi (EleutherAI#1194)
    
    * publish to pypi
    
    * lint
    
    * Update publish.yml
    
    * minor
    
    * Make dependencies compatible with PyPI (EleutherAI#1378)
    
    * make deps not point to github urls
    
    * formatting
    
    * try making PyPI only run on tag pushes
    
    * Add support for RWKV models with World tokenizer (EleutherAI#1374)
    
    * Add support for RWKV models with World tokenizer
    
    The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0
    
    This however fails all the "if set" checks, and would cause the tokenizer to crash.
    
    A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers
    
    * Update huggingface.py
    
    Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes.
    
    * Comply with formatting guidelines
    
    * fix format
    
    ---------
    
    Co-authored-by: Stella Biderman <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * add bypass metric (EleutherAI#1156)
    
    * add bypass metric
    
    * fixed `bypass` metric.
    
    * add task attributes if predict_only
    
    * add `predict_only` checks
    
    * add docs
    
    * added `overide_metric`, `override_config` to `Task`
    
    * nits
    
    * nit
    
    * changed --predict_only to generations; nits
    
    * nits
    
    * nits
    
    * change gen_kwargs warning
    
    * add note about `--predict_only` in README.md
    
    * added `predict_only`
    
    * move table to bottom
    
    * nit
    
    * change null aggregation to bypass (conflict)
    
    * bugfix; default `temp=0.0`
    
    * typo
    
    * loglikelihood refactor using template lm
    
    * lint
    
    * code review
    
    * neuron optimum
    
    * Mention TemplateLM in model_guide.md
    
    * Update lm_eval/api/model.py
    
    * fix linter
    
    * fix format
    
    * fix format
    
    * fix format
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: Lintang Sutawika <[email protected]>
    Co-authored-by: Stella Biderman <[email protected]>
    Co-authored-by: Mark Saroufim <[email protected]>
    Co-authored-by: Hannibal046 <[email protected]>
    Co-authored-by: Danielle Pintz <[email protected]>
    Co-authored-by: Quentin Lhoest <[email protected]>
    Co-authored-by: kwrobel.eth <[email protected]>
    Co-authored-by: Michael Goin <[email protected]>
    Co-authored-by: Brian Vaughan <[email protected]>
    Co-authored-by: Baber Abbasi <[email protected]>
    Co-authored-by: thnkinbtfly <[email protected]>
    Co-authored-by: NoushNabi <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    Co-authored-by: LSinev <[email protected]>
    Co-authored-by: Eugene Cheah <[email protected]>
    17 people committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    fbd9bf6 View commit details
    Browse the repository at this point in the history
  57. Log which subtasks were called with which groups (EleutherAI#1456)

    * log group membership
    
    * no stray prints
    
    * Update evaluator.py
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    0d1af67 View commit details
    Browse the repository at this point in the history
  58. PR fixing the issue EleutherAI#1391 (wrong contexts in the mgsm task) (

    …EleutherAI#1440)
    
    * fix the issue EleutherAI#1391, wrong contexts in mgsm tasks
    
    * fix yaml issue for having two target_delimiter lines. For COT tasks, keep the one with a space (default)
    
    * regenerate all task yaml files
    - change naming so that file name will match with task name
    - task|file follows a consistent naming way, mgsm_(mode)_(lang) for three modes, i.e., direct, en_cot, and native_cot
    
    * English CoTs should have a space as target_delimiter
    
    * Update utils.py
    
    * Apply suggestions from code review
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b8bee2c View commit details
    Browse the repository at this point in the history
  59. feat: Add Weights and Biases support (EleutherAI#1339)

    * add wandb as extra dependency
    
    * wandb metrics logging
    
    * refactor
    
    * log samples as tables
    
    * fix linter
    
    * refactor: put in a class
    
    * change dir
    
    * add panels
    
    * log eval as table
    
    * improve tables logging
    
    * improve reports logging
    
    * precommit run
    
    * ruff check
    
    * handle importing reports api gracefully
    
    * ruff
    
    * compare results
    
    * minor pre-commit fixes
    
    * build comparison report
    
    * ruff check
    
    * log results as artifacts
    
    * remove comparison script
    
    * update dependency
    
    * type annotate and docstring
    
    * add example
    
    * update readme
    
    * fix typo
    
    * teardown
    
    * handle outside wandb run
    
    * gracefully fail reports creation
    
    * precommit checks
    
    * add report url to summary
    
    * use wandb  printer for better url stdout
    
    * fix ruff
    
    * handle N/A and groups
    
    * fix eval table
    
    * remove unused var
    
    * update wandb version req + disable reports stdout
    
    * remove reports feature to TODO
    
    * add label to multi-choice question data
    
    * log model predictions
    
    * lints
    
    * loglikelihood_rolling
    
    * log eval result for groups
    
    * log tables by group for better handling
    
    * precommit
    
    * choices column for multi-choice
    
    * graciously fail wandb
    
    * remove reports feature
    
    * track system metrics + total eval time + stdout
    
    ---------
    
    Co-authored-by: Lintang Sutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    cf1577a View commit details
    Browse the repository at this point in the history
  60. Fixed generation args issue affection OpenAI completion model (Eleuth…

    …erAI#1458)
    
    * Fixed generation args issue affection openai completion model
    
    * Fixed hf unit test; removed pop attributes in OpenAi completion.
    
    * fix format
    
    * fix format
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    dd5bee9 View commit details
    Browse the repository at this point in the history
  61. Configuration menu
    Copy the full SHA
    be5a419 View commit details
    Browse the repository at this point in the history
  62. Adding documentation for Weights and Biases CLI interface (EleutherAI…

    …#1466)
    
    * interface docs
    
    * fix link
    veekaybee authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    4024ebb View commit details
    Browse the repository at this point in the history
  63. Add environment and transformers version logging in results dump (Ele…

    …utherAI#1464)
    
    * Save git_hash to results even if git is not available to call as subprocess
    
    * Store more info about environment and transformers version in results to help researchers track inconsistencies
    
    * moved added logging to logging_utils
    
    * moved get_git_commit_hash to logging_utils.py
    
    * moved add_env_info inside evaluator
    LSinev authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8a4827a View commit details
    Browse the repository at this point in the history
  64. Configuration menu
    Copy the full SHA
    72d40c9 View commit details
    Browse the repository at this point in the history
  65. Configuration menu
    Copy the full SHA
    053cf56 View commit details
    Browse the repository at this point in the history
  66. add arabic mmlu (EleutherAI#1402)

    * add arabic mmlu
    
    * update the description
    
    * add readme file
    khalil-Hennara authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    e112b37 View commit details
    Browse the repository at this point in the history
  67. Add Gemma support (Add flag to control BOS token usage) (EleutherAI#1465

    )
    
    * add add_bos_token to HFLM
    
    * add BOS token flag to other local model classes
    
    ---------
    
    Co-authored-by: Lintang Sutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    420556e View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    06a4347 View commit details
    Browse the repository at this point in the history
  69. Create a means for caching task registration and request building. Ad… (

    EleutherAI#1372)
    
    * Create a means for caching task registration and request building. Add the ability to specify an args dict for simple_evaluate().
    
    * Remove extra S in cache path in caching module
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Rename requests cache args, make model_args polymorphic so that a dict can also be accepted.
    
    * Update docs to reflect new caching behavior, add CLI args for requests caching. Create a function for deleting items in the cache.
    
    * Update documentation, fix minor bug with arg parsing for requests caching where an undefined variable was used.
    
    * Remove line from gitignore, add to cli for caching datasets.
    
    * Add hashing suffix to .pickles. Update test script typo.
    
    * Favor isinstance() over type() in evaluator.py
    
    * Add tests for caching, gets tests working, remove unneeded arg from build_all_requests().
    
    * Update arg description to simple_evaluate.
    
    * Update pyproject.toml
    
    * Fix typehint
    
    * Remove the use of random() for creating default cache pickle hash.
    
    * Check that cache dir exists before clearing it in request cache tests.
    
    * Fix linting problems.
    
    * Fix additional formatting errors.
    
    * Remove trailing whitespace.
    
    * Add new line to the end of .gitignore.
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    af2d9f6 View commit details
    Browse the repository at this point in the history
  70. Cont metrics (EleutherAI#1475)

    * add brier_score
    
    * process brier_score
    
    * brier score is working for N-sized class
    
    * fxied brier score
    
    * add TED to BigBench and Brier score to MMLU
    
    * format
    
    * Update metrics.py
    
    * Update task.py
    
    * Update generate_until_template_yaml
    
    * Delete lm_eval/tasks/bigbench/aux_metric.py
    
    * Update generate_until_template_yaml
    
    * Update _default_template_yaml
    
    * Update _generate_configs.py
    
    * Update _generate_configs.py
    
    * Update _generate_configs.py
    
    * fix (format?)
    
    * format?
    
    * format, once more
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    9600d59 View commit details
    Browse the repository at this point in the history
  71. Refactor evaluater.evaluate (EleutherAI#1441)

    * change `all_gather` to `gather`
    
    * add TaskOutput utility class
    
    * Add FilterResults class and refactor task handling.
    
    * Rename `key` to `filter_key` for clarity
    
    * Add `print_writeout` function in utils.py
    
    * Add function to calculate limit size.
    
    * Add doc_iterator method to Task class
    
    * Refactor `doc_iterator` and cleanup in Task class
    
    * remove superfluous bits
    
    * change `all_gather` to `gather`
    
    * bugfix
    
    * bugfix
    
    * fix `gather`
    
    * Refactor `gather` loop
    
    * Refactor aggregate metrics calculation
    
    * Refactor and simplify aggregate metrics calculation
    Removed unused code
    
    * Simplify metrics calculation and remove unused code.
    
    * simplify the metrics calculation in `utils.py` and `evaluator.py`.
    
    * Fix group metric
    
    * change evaluate to hf_evaluate
    
    * change evaluate to hf_evaluate
    
    * add docs
    
    * add docs
    
    * nits
    
    * make isslice keyword only
    
    * nit
    
    * add todo
    
    * nit
    
    * nit
    
    * nit: swap order samples_metrics tuple
    
    * move instance sorting outside loop
    
    * nit
    
    * nit
    
    * Add __repr__ for ConfigurableTask
    
    * nit
    
    * nit
    
    * Revert "nit"
    
    This reverts commit dab8d99.
    
    * fix some logging
    
    * nit
    
    * fix `predict_only` bug. thanks to `@LSinev`!
    
    * change `print_tasks` to `prepare_print_tasks`
    
    * nits
    
    * move eval utils
    
    * move eval utils
    
    * nit
    
    * add comment
    
    * added tqdm descriptions
    
    * Update lm_eval/evaluator_utils.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * fix mgsm bug
    
    * nit
    
    * fix `build_all_requests`
    
    * pre-commit
    
    * add ceil to limit
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    7fe8dcb View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    77ffeef View commit details
    Browse the repository at this point in the history
  73. Configuration menu
    Copy the full SHA
    6093c0c View commit details
    Browse the repository at this point in the history
  74. Fix AttributeError in huggingface.py When 'model_type' is Missing (El…

    …eutherAI#1489)
    
    * model_type attribute error
    
    Getting attribute error when using a model without a 'model_type'
    
    * fix w/ and w/out the 'model_type' specification
    
    * use getattr(), also fix other config.model_type reference
    
    * Update huggingface.py
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    814f36e View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    c463825 View commit details
    Browse the repository at this point in the history
  76. Configuration menu
    Copy the full SHA
    47d0899 View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    0413dee View commit details
    Browse the repository at this point in the history
  78. Improve data-parallel request partitioning for VLLM (EleutherAI#1477)

    * add undistribute + use more_itertools
    
    * remove divide() util fn
    
    * add more_itertools as dependency
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d579c8b View commit details
    Browse the repository at this point in the history
  79. modify WandbLogger to accept arbitrary kwargs (EleutherAI#1491)

    * make `WandbLogger` init args optional
    
    * nit
    
    * nit
    
    * nit
    
    * move import warning to `WandbLogger`
    
    * nit
    
    * update docs
    
    * nit
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8146103 View commit details
    Browse the repository at this point in the history
  80. Vllm update DP+TP (EleutherAI#1508)

    * use `@ray.remote` with distributed vLLM
    
    * update versions
    
    * bugfix
    
    * unpin vllm
    
    * fix pre-commit
    
    * added version assertion error
    
    * Revert "added version assertion error"
    
    This reverts commit 8041e9b.
    
    * added version assertion for DP
    
    * expand DP note
    
    * add warning
    
    * nit
    
    * pin vllm
    
    * fix typos
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    30141ce View commit details
    Browse the repository at this point in the history
  81. Setting trust_remote_code to True for HuggingFace datasets compatibil…

    …ity (EleutherAI#1487)
    
    * setting trust_remote_code
    
    * dataset list no notebooks
    
    * respect trust remote code
    
    * Address changes, move cli options and change datasets
    
    * fix task for tests
    
    * headqa
    
    * remove kobest
    
    * pin datasets and address comments
    
    * clean up space
    veekaybee authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    706e10b View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    40b0917 View commit details
    Browse the repository at this point in the history
  83. French Bench (EleutherAI#1500)

    * add french-bench
    
    * rename arc easy
    
    * linting
    
    * update datasets for no remote code exec
    
    * fix string delimiter
    
    * add info to readmr
    
    * trim trailing whitespace
    
    * add detailed groups
    
    * add info to readme
    
    * remove orangesum title from fbench main
    
    * Force PPL tasks to be 0-shot
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    4f19431 View commit details
    Browse the repository at this point in the history
  84. Configuration menu
    Copy the full SHA
    512de72 View commit details
    Browse the repository at this point in the history
  85. Fix minor edge cases (EleutherAI#951 EleutherAI#1503) (EleutherAI#1520)

    * Fix padding
    
    * Fix elif in model loading
    
    * format
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b915040 View commit details
    Browse the repository at this point in the history
  86. Openllm benchmark (EleutherAI#1526)

    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2c652b5 View commit details
    Browse the repository at this point in the history
  87. Add a new task GPQA (the part CoT and generative) (EleutherAI#1482)

    * Add new tasks of GPQA
    
    * Add README
    
    * Remove unused functions
    
    * Remove unused functions
    
    * Linters
    
    * Add flexible match
    
    * update
    
    * Remove deplicate function
    
    * Linter
    
    * update
    
    * Update lm_eval/filters/extraction.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * register multi_choice_regex
    
    * Update
    
    * run precommit
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    175bc29 View commit details
    Browse the repository at this point in the history
  88. Add EQ-Bench as per EleutherAI#1459 (EleutherAI#1511)

    * Start adding eq-bench
    
    * Start adding to yaml and utils
    
    * Get metric working
    
    * Add README
    
    * Handle cases where answer is not parseable
    
    * Deal with unparseable answers and add percent_parseable metric
    
    * Update README
    pbevan1 authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    5c8105c View commit details
    Browse the repository at this point in the history
  89. Add WMDP Multiple-choice (EleutherAI#1534)

    * init wmdp yaml file
    
    * Add WMDP Multiple-choice
    
    * fix linter issues
    
    * Delete lm_eval/tasks/wmdp/_wmdp.yaml
    
    ---------
    
    Co-authored-by: Lintang Sutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    44f9421 View commit details
    Browse the repository at this point in the history
  90. Configuration menu
    Copy the full SHA
    c9f39fa View commit details
    Browse the repository at this point in the history
  91. Configuration menu
    Copy the full SHA
    7aedaf9 View commit details
    Browse the repository at this point in the history
  92. update printed num-fewshot ; prevent fewshots from erroneously being …

    …used by cot which hardcodes fewshot prompt (EleutherAI#1502)
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8c1c093 View commit details
    Browse the repository at this point in the history
  93. Cleanup and fixes (Task, Instance, and a little bit of *evaluate) (El…

    …eutherAI#1533)
    
    * Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided)
    
    * Fix improper import of LM and usage of evaluator in one of scripts
    
    * update type hints in instance and task api
    
    * raising errors in task.py instead of asserts
    
    * Fix warnings from ruff
    
    * raising errors in __main__.py instead of asserts
    
    * raising errors in tasks/__init__.py instead of asserts
    
    * raising errors in evaluator.py instead of asserts
    
    * evaluator: update type hints and remove unused variables in code
    
    * Update lm_eval/__main__.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update lm_eval/__main__.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update lm_eval/api/task.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update lm_eval/api/task.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update lm_eval/api/task.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update lm_eval/evaluator.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * pre-commit induced fixes
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f238713 View commit details
    Browse the repository at this point in the history
  94. Update installation commands in openai_completions.py and contributin…

    …g document and, update wandb_args description (EleutherAI#1536)
    
    * Update openai completions and docs/CONTRIBUTING.md
    
    * Update wandb args description
    
    * Update docs/interface.md
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    3b419af View commit details
    Browse the repository at this point in the history
  95. Add compatibility for vLLM's new Logprob object (EleutherAI#1549)

    * Add compatibility for vLLM's new Logprob object
    
    * Fix
    
    * Update lm_eval/models/vllm_causallms.py
    
    * fix format?
    
    * trailing whitespace
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6997af7 View commit details
    Browse the repository at this point in the history
  96. Fix incorrect max_gen_toks generation kwarg default in code2_text. (E…

    …leutherAI#1551)
    
    * update gen_kwargs in code2-text-go.yaml
    
    * update gen_kwargs in rest code2-text
    cosmo3769 authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    74d9a95 View commit details
    Browse the repository at this point in the history
  97. Support jinja templating for task descriptions (EleutherAI#1553)

    * Support jinja templating for "description"
    
    * Update task_guide.md
    
    * Update lm_eval/api/task.py
    
    * fix format?
    
    * whitespace errors
    
    * fix whitespace
    
    * fix bad variable reference
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8d5e277 View commit details
    Browse the repository at this point in the history
  98. Configuration menu
    Copy the full SHA
    7ffd0d1 View commit details
    Browse the repository at this point in the history
  99. Configuration menu
    Copy the full SHA
    58cda52 View commit details
    Browse the repository at this point in the history
  100. add Arabic EXAMS benchmark (EleutherAI#1498)

    * add Arabic EXAMS benchmark
    
    * fixed the linter issue, and add more information on the readme
    
    * Update README.md
    
    ---------
    
    Co-authored-by: Lintang Sutawika <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    1858b54 View commit details
    Browse the repository at this point in the history
  101. AGIEval (EleutherAI#1359)

    * add agieval
    
    * fix typo
    
    * add cloze / math exactmatch agieval tasks, rename
    
    * update exact-match agieval tasks, allow for multiple-correct answers
    
    * add more detail to readme
    
    * don't parse_math_answer twice
    
    ---------
    
    Co-authored-by: Alex Bäuerle <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    5298fc0 View commit details
    Browse the repository at this point in the history
  102. Configuration menu
    Copy the full SHA
    94f7159 View commit details
    Browse the repository at this point in the history
  103. add manual tqdm disabling management (EleutherAI#1569)

    * add manual tqdm disabling management
    
    * add typing to all new args
    
    * apply precommit changes
    
    ---------
    
    Co-authored-by: haileyschoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ee0e166 View commit details
    Browse the repository at this point in the history
  104. Fix README section on vllm integration (EleutherAI#1579)

    * Link to vllm integration
    
    * add pip install .[vllm] cmd
    eitanturok authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    28e568d View commit details
    Browse the repository at this point in the history
  105. Configuration menu
    Copy the full SHA
    df6ee7a View commit details
    Browse the repository at this point in the history
  106. Proposed approach for testing CLI arg parsing (EleutherAI#1566)

    * New tests for CLI args
    
    * fix spacing
    
    * change tests for parsing
    
    * add tests, fix parser
    
    * remove defaults for store_true
    veekaybee authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c6edcdb View commit details
    Browse the repository at this point in the history
  107. Patch for Seq2Seq Model predictions (EleutherAI#1584)

    * Differentiate _encode_pair setting for decoder and enc-dec models
    
    * tok_decode to not skip special token so that eos doen't become empty string
    
    * Update model.py
    
    * Update model.py
    
    * Update huggingface.py
    
    * Update lm_eval/models/huggingface.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Update model.py
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    0dc609d View commit details
    Browse the repository at this point in the history
  108. Configuration menu
    Copy the full SHA
    baa917f View commit details
    Browse the repository at this point in the history
  109. Cleanup for v0.4.2 release (EleutherAI#1573)

    * Update interface.md
    
    * fix: make caching reqs always work with accelerate launch
    
    * remove stale task migration checklist
    
    * remove deprecation warnings
    
    * make informative TypeErrors for get_task_dict
    
    * bump version metadata
    
    * fix num_fewshot printing bug
    
    * add fewshot value to cache key
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    53c11f7 View commit details
    Browse the repository at this point in the history
  110. Fix eval_logger import for mmlu/_generate_configs.py (EleutherAI#1593)

    * Fix eval_logger import for mmlu/_generate_configs.py
    
    * linter
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6e52d16 View commit details
    Browse the repository at this point in the history
  111. use BOS token in loglikelihood (EleutherAI#1588)

    * use BOS token in loglikelihood
    
    * improve comments
    
    * add model arg
    
    * log prefix token id
    
    * log prefix token id
    
    * Update lm_eval/api/model.py
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * change name to prefix_token_id
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    djstrong and haileyschoelkopf committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8cd155f View commit details
    Browse the repository at this point in the history
  112. Configuration menu
    Copy the full SHA
    1ea55eb View commit details
    Browse the repository at this point in the history
  113. Configuration menu
    Copy the full SHA
    5a304c9 View commit details
    Browse the repository at this point in the history
  114. Configuration menu
    Copy the full SHA
    39a0b3a View commit details
    Browse the repository at this point in the history
  115. Fixes to Loglikelihood prefix token / VLLM (EleutherAI#1611)

    * make vllm use prefix_token_id ; have prefix_token_id be optional method to define
    
    * custom_prefix_token_id wasn't set if not passed
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    a513931 View commit details
    Browse the repository at this point in the history
  116. Add ACLUE task (EleutherAI#1614)

    * Add task ACLUE
    
    * fix minor bug
    
    * fix code style
    
    * fix code style
    haonan-li authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    7d8eeba View commit details
    Browse the repository at this point in the history
  117. Configuration menu
    Copy the full SHA
    45ed815 View commit details
    Browse the repository at this point in the history
  118. add logging of model args (EleutherAI#1619)

    * add logging of model args
    
    * nit
    
    * Add warnings.
    
    * nit
    
    * add warning
    
    * nit
    baberabb authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    9064d35 View commit details
    Browse the repository at this point in the history
  119. Configuration menu
    Copy the full SHA
    7c7e4fd View commit details
    Browse the repository at this point in the history
  120. peft Version Assertion (EleutherAI#1635)

    * peft Version Assertion
    
    * fix the linter issue
    LameloBally authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f970123 View commit details
    Browse the repository at this point in the history
  121. Seq2seq fix (EleutherAI#1604)

    * fix on --task list
    
    * add fixes to tokeniation
    
    * differentiate encoding for seq2seq and decoder
    
    * return token setting
    
    * format for pre-commit
    
    * Seq2seq fix, pt2 (EleutherAI#1630)
    
    * getting model class only when defined
    
    * encode_pair handles None, add_special_tokens turned into dict with default value
    
    ---------
    
    Co-authored-by: achervyakov <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    048c0d3 View commit details
    Browse the repository at this point in the history
  122. Integration of NeMo models into LM Evaluation Harness library (Eleuth…

    …erAI#1598)
    
    * Integration of NeMo models into LM Evaluation Harness library
    
    * rename nemo model as nemo_lm
    
    * move nemo section in readme after hf section
    
    * use self.eot_token_id in get_until()
    
    * improve progress bar showing loglikelihood requests
    
    * data replication or tensor/pipeline replication working fine within one node
    
    * run pre-commit on modified files
    
    * check whether dependencies are installed
    
    * clarify usage of torchrun in README
    sergiopperez authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    9f50796 View commit details
    Browse the repository at this point in the history
  123. Configuration menu
    Copy the full SHA
    f0b04a0 View commit details
    Browse the repository at this point in the history
  124. Configuration menu
    Copy the full SHA
    fa2acde View commit details
    Browse the repository at this point in the history
  125. Add Latxa paper evaluation tasks for Basque (EleutherAI#1654)

    * add basqueglue
    
    * add eus_exams
    
    * add eus_proficiency
    
    * add eus_reading
    
    * add eus_trivia
    
    * run pre-commit
    juletx authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b948d14 View commit details
    Browse the repository at this point in the history
  126. Fix CLI --batch_size arg for openai-completions/local-completions (El…

    …eutherAI#1656)
    
    The OpenAI interface supports batch size as an argument to the completions API, but does not seem to support specification of this on the CLI i.e. `lm_eval --model openai-completions --batch_size 16 ...` because of a simple lack of str->int conversion.
    
    This is confirmed by my usage and stacktrace from running `OPENAI_API_KEY=dummy lm_eval --model local-completions --tasks gsm8k --batch_size 16 --model_args model=nm-
    testing/zephyr-beta-7b-gptq-g128,tokenizer_backend=huggingface,base_url=http://localhost:8000/v1`:
    ```
    Traceback (most recent call last):
      File "/home/michael/venv/bin/lm_eval", line 8, in <module>
        sys.exit(cli_evaluate())
      File "/home/michael/code/lm-evaluation-harness/lm_eval/__main__.py", line 341, in cli_evaluate
        results = evaluator.simple_evaluate(
      File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
        return fn(*args, **kwargs)
      File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 251, in simple_evaluate
        results = evaluate(
      File "/home/michael/code/lm-evaluation-harness/lm_eval/utils.py", line 288, in _wrapper
        return fn(*args, **kwargs)
      File "/home/michael/code/lm-evaluation-harness/lm_eval/evaluator.py", line 390, in evaluate
        resps = getattr(lm, reqtype)(cloned_reqs)
      File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 263, in generate_until
        list(sameuntil_chunks(re_ord.get_reordered(), self.batch_size)),
      File "/home/michael/code/lm-evaluation-harness/lm_eval/models/openai_completions.py", line 251, in sameuntil_chunks
        if len(ret) >= size or x[1] != lastuntil:
    TypeError: '>=' not supported between instances of 'int' and 'str'
    ```
    mgoin authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    da93b8a View commit details
    Browse the repository at this point in the history
  127. Configuration menu
    Copy the full SHA
    cf10ee7 View commit details
    Browse the repository at this point in the history
  128. TMMLU+ implementation (EleutherAI#1394)

    * implementation of TMMLU+
    
    * implemented: TMMLU+
    
    ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding****
    
    - 4 categories
        - STEM
        - Social Science
        - Humanities
        - Other
    
    The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models.
    
    ```markdown
    Total number of tasks in the 'test' sets: 20160
    Total number of tasks in the 'validation' sets: 2247
    Total number of tasks in the 'train' sets: 335
    ```
    
    * Remove print from __init__.py
    
    There was my mistake in forgetting to remove the debug print from the code.
    
    * update: move TMMLU+ config generation program into default
    
    * fix: we should use training set as few shots example
    
    * update: README for TMMLU+
    
    * update: a small changes of TMMLU+ README file
    
    * pre-commit run thought
    
    * Add README for TMMLU+ dataset
    
    * run precommit
    
    * trigger precommit again
    
    * trigger precommit again
    
    * isort is fussy
    
    * isort is fussy
    
    * format, again
    
    * oops
    
    * oops
    
    ---------
    
    Co-authored-by: lintang <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    76a7c23 View commit details
    Browse the repository at this point in the history
  129. Anthropic Chat API (EleutherAI#1594)

    * claude3
    
    * supply for anthropic claude3
    
    * supply for anthropic claude3
    
    * anthropic config changes
    
    * add callback options on anthropic
    
    * line passed
    
    * claude3 tiny change
    
    * help anthropic installation
    
    * mention sysprompt / being careful with format in readme
    
    ---------
    
    Co-authored-by: haileyschoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6786e82 View commit details
    Browse the repository at this point in the history
  130. correction bug EleutherAI#1664 (EleutherAI#1670)

    * correction bug EleutherAI#1664
    
    * add any invalid characters for Windows filenames and Unix-like systems
    
    see:
    https://gist.github.com/doctaphred/d01d05291546186941e1b7ddc02034d3?permalink_comment_id=3958715
    
    * Update lm_eval/__main__.py
    
    * Update scripts/zeno_visualize.py
    
    * fix format
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    98693bf View commit details
    Browse the repository at this point in the history
  131. Configuration menu
    Copy the full SHA
    c374e6f View commit details
    Browse the repository at this point in the history
  132. Add delta weights model loading (EleutherAI#1712)

    * added delta weights
    
    * removed debug
    
    * readme update
    
    * better error handling
    
    * autogptq warn
    
    * warn update
    
    * peft and delta error, explicitly deleting _model_delta
    
    * linter fix
    KonradSzafer authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8518800 View commit details
    Browse the repository at this point in the history
  133. Add neuralmagic models for sparseml and deepsparse (EleutherAI#…

    …1674)
    
    * Add neuralmagic models for SparseML and DeepSparse
    
    * Update to latest and add test
    
    * Format
    
    * Fix list to List
    
    * Format
    
    * Add deepsparse/sparseml to automated testing
    
    * Update pyproject.toml
    
    * Update pyproject.toml
    
    * Update README
    
    * Fixes for dtype and device
    
    * Format
    
    * Fix test
    
    * Apply suggestions from code review
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * Address review comments!
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8103925 View commit details
    Browse the repository at this point in the history
  134. Configuration menu
    Copy the full SHA
    a56bf85 View commit details
    Browse the repository at this point in the history
  135. Configuration menu
    Copy the full SHA
    a09b018 View commit details
    Browse the repository at this point in the history
  136. Configuration menu
    Copy the full SHA
    6687de7 View commit details
    Browse the repository at this point in the history
  137. Add XNLIeu: a dataset for cross-lingual NLI in Basque (EleutherAI#1694)

    * add xnli_eu tasks
    
    * update tasks readme
    
    * update readme
    juletx authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    fe92e5a View commit details
    Browse the repository at this point in the history
  138. Fix Parameter Propagation for Tasks that have include (EleutherAI#1749

    )
    
    * Update task.py
    
    * Update __init__.py
    lintangsutawika authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d69d54d View commit details
    Browse the repository at this point in the history
  139. Support individual scrolls datasets (EleutherAI#1740)

    * Support individual scrolls datasets
    
    * Add qmsum context
    
    * Fix formatting
    giorgossideris authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f38e8a1 View commit details
    Browse the repository at this point in the history
  140. Add filter registry decorator (EleutherAI#1750)

    * Add register_filter decorator
    
    * Add register_filter docs
    lozhn authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    7cd59dd View commit details
    Browse the repository at this point in the history
  141. Configuration menu
    Copy the full SHA
    dabce43 View commit details
    Browse the repository at this point in the history
  142. Pile 10k new task (EleutherAI#1758)

    * Add Pile-10k readme
    
    * Add Pile-10k task configuration file
    mukobi authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f4281a4 View commit details
    Browse the repository at this point in the history
  143. Fix m_arc choices (EleutherAI#1760)

    * Update utils.py
    
    This is a 4-choice task, option_e is null for all but 3 samples
    
    * Fix options
    
    Adaptive choices
    
    * add option e
    
    * bump multilingual arc version
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c51925d View commit details
    Browse the repository at this point in the history
  144. upload new tasks (EleutherAI#1728)

    * upload new tasks
    
    * add readmes
    
    * run linters
    
    ---------
    
    Co-authored-by: haileyschoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    e2bc623 View commit details
    Browse the repository at this point in the history
  145. vllm lora support (EleutherAI#1756)

    * vllm lora support
    
    * remove print
    
    * version check, rename lora kwarg
    bcicc authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    df05e78 View commit details
    Browse the repository at this point in the history
  146. Add option to set OpenVINO config (EleutherAI#1730)

    * Add option to set OpenVINO config
    
    * Use utils.eval_logger for logging
    helena-intel authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    af14500 View commit details
    Browse the repository at this point in the history
  147. evaluation tracker implementation (EleutherAI#1766)

    * evaluation tracker implementation
    
    * OVModelForCausalLM test fix
    
    * typo fix
    
    * moved methods args
    
    * multiple args in one flag
    
    * loggers moved to dedicated dir
    
    * improved filename sanitization
    KonradSzafer authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ba53c71 View commit details
    Browse the repository at this point in the history
  148. Configuration menu
    Copy the full SHA
    da3067f View commit details
    Browse the repository at this point in the history
  149. limit fix (EleutherAI#1785)

    KonradSzafer authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ffc6594 View commit details
    Browse the repository at this point in the history
  150. remove echo parameter in OpenAI completions API (EleutherAI#1779)

    * remove echo parameter in OpenAI completions API
    
    * remove context length parameter doc string
    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d261c2f View commit details
    Browse the repository at this point in the history
  151. Fix README: change----hf_hub_log_args to --hf_hub_log_args (Eleut…

    …herAI#1776)
    
    fix `----hf_hub_log_args` to `--hf_hub_log_args`
    MuhammadBinUsman03 authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    29812e7 View commit details
    Browse the repository at this point in the history
  152. Configuration menu
    Copy the full SHA
    45c5f41 View commit details
    Browse the repository at this point in the history
  153. Provide ability for custom sampler for ConfigurableTask (EleutherAI#1616

    )
    
    * Added fewshot sampling seeds to evaluator.simple_evaluate signature
    
    Way to control seed of fewshot sampling
    may help with EleutherAI#1591
    
    * Added ability for custom sampler for ConfigurableTask
    
    May be set in config like
    ```
    fewshot_config:
      sampler: !function utils.MyFewshotSampler
    ```
    
    * explicitly set fewshot random generator seed for HFLM generate_until_task test
    
    * add backward compatibility for three args seed setup
    
    * save seeds info to logs/reports
    LSinev authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    615b2dd View commit details
    Browse the repository at this point in the history
  154. Configuration menu
    Copy the full SHA
    59c553a View commit details
    Browse the repository at this point in the history
  155. Configuration menu
    Copy the full SHA
    4e63a32 View commit details
    Browse the repository at this point in the history
  156. Configuration menu
    Copy the full SHA
    ea773e4 View commit details
    Browse the repository at this point in the history
  157. Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt…

    …) variant (EleutherAI#1793)
    
    * add Hendrycks MATH (no sympy checking) variant
    
    * add readmes for MATH tasks
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    aa4e118 View commit details
    Browse the repository at this point in the history
  158. Logging Updates (Alphabetize table printouts, fix eval tracker bug) (E…

    …leutherAI#1774) (EleutherAI#1791)
    
    * fix auto-batch size bug for seq2seq models
    
    * alphabetize task + group tables ; fix eval tracker bug
    
    * fix eval tracker bug
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b3e8661 View commit details
    Browse the repository at this point in the history
  159. Initial integration of the Unitxt to LM eval harness (EleutherAI#1615)

    * Initial support for Unitxt datasets in LM Eval Harness
    
    See  https://github.com/IBM/unitxt
    
    The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.
    
    The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.
    
    * Added dataset loading check to generate_yaml
    
    Improved error messages.
    
    * Speed up generate_yaml
    
    Added printouts and improved error message
    
    * Added output printout
    
    * Simplified integration of unitxt datasets
    
    Store all the common yaml configuration in a yaml include shared by all datasets of the same task.
    
    * Post code review comments - part 1
    
    1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
    2. Added more datasets and tasks (NER, GEC)
    3. Added README
    
    * Post code review comments - part 2
    
    1. Added install unitxt install option in pyproject.toml:
    pip install 'lm_eval[unitxt]'
    2. Added a check that unitxt is installed and print a clear error message if not
    
    * Commited missing pyproject change
    
    * Added documentation on adding datasets
    
    * More doc changes
    
    * add unitxt extra to readme
    
    * run precommit
    
    ---------
    
    Co-authored-by: haileyschoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c864ea2 View commit details
    Browse the repository at this point in the history
  160. add task for mmlu evaluation in arc multiple choice format (EleutherA…

    …I#1745)
    
    * add mmlu arc style evaluation
    
    * rename arc_style to continuation
    
    ---------
    
    Co-authored-by: Jonathan Burdge <[email protected]>
    Co-authored-by: Jonathan Burdge <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    bba2bf6 View commit details
    Browse the repository at this point in the history
  161. Update flag --hf_hub_log_args in interface documentation (EleutherA…

    …I#1806)
    
    * update interface documentation with flag --hf_hub_logs_arg
    
    * update interface documentation with flag --hf_hub_logs_arg 2
    sepiatone authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    a137c3e View commit details
    Browse the repository at this point in the history
  162. Copal task (EleutherAI#1803)

    * add copal
    
    * change name to copal id for clarity and the task name
    
    * remove `copal_id...` to yaml to make it work
    
    * checkmark on README
    
    * change group name to `copal_id`
    Erland366 authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    cd0b2ba View commit details
    Browse the repository at this point in the history
  163. Adding tinyBenchmarks datasets (EleutherAI#1545)

    * Add tinyBenchmarks
    
    * Add acknowledgements
    
    * Add ordering of outputs for data-parallel
    
    * Run pre-commit
    
    * Add few_shot specifications
    
    * Add tinyBenchmarks post-processing
    
    * add conditional import ; fix task names
    
    ---------
    
    Co-authored-by: haileyschoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6bcb05e View commit details
    Browse the repository at this point in the history
  164. Configuration menu
    Copy the full SHA
    e888fb6 View commit details
    Browse the repository at this point in the history
  165. Configuration menu
    Copy the full SHA
    ab46906 View commit details
    Browse the repository at this point in the history
  166. Fix: support PEFT/LoRA with added tokens (EleutherAI#1828)

    * resize model embeddings
    
    * resize only
    
    * tokenizer help
    
    * load tokenizer before model
    
    * add comment and run precommit lint
    
    * Add log message
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    5759d86 View commit details
    Browse the repository at this point in the history
  167. Configuration menu
    Copy the full SHA
    b542fd9 View commit details
    Browse the repository at this point in the history
  168. fixed docs typos (EleutherAI#1863)

    zafstojano authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d02eb34 View commit details
    Browse the repository at this point in the history
  169. Configuration menu
    Copy the full SHA
    21f36dd View commit details
    Browse the repository at this point in the history
  170. Configuration menu
    Copy the full SHA
    5ca629a View commit details
    Browse the repository at this point in the history
  171. Configuration menu
    Copy the full SHA
    2b93289 View commit details
    Browse the repository at this point in the history
  172. Fix batch_size=auto for HF Seq2Seq models (EleutherAI#1765) (Eleuth…

    …erAI#1790)
    
    * fix auto-batch size bug for seq2seq models
    
    * run linter
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    e6223c0 View commit details
    Browse the repository at this point in the history
  173. Fix Brier Score (EleutherAI#1847)

    `gold_one_hot` needs to follow the dimension of predictions so that it still works when `--limit` is used and the indexes in gold does not cover all gold indexes.
    lintangsutawika authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8329adb View commit details
    Browse the repository at this point in the history
  174. Fix for bootstrap_iters = 0 case (EleutherAI#1715) (EleutherAI#1789)

    * add handling for bootstrap_iters=0 case
    
    * add more detail to docstring
    
    * run precommit
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    e3ec75f View commit details
    Browse the repository at this point in the history
  175. add mmlu tasks from pile-t5 (EleutherAI#1710)

    * add mmlu tasks from pile-t5
    
    * Update _mmlu_flan_cot_fewshot_template_yaml
    
    * Update _mmlu_flan_cot_zeroshot_template_yaml
    
    * Update _mmlu_flan_generative_template_yaml
    
    * Update _mmlu_flan_loglikelihood_template_yaml
    
    * Update _default_template_yaml
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ee44bf2 View commit details
    Browse the repository at this point in the history
  176. Bigbench fix (EleutherAI#1686)

    * edit process multiple-choice
    
    * split template yaml
    
    * remove
    
    * modified multiple_choice tasks
    
    * udpate
    
    * Update multiple_choice_template_b_yaml
    
    * Update multiple_choice_template_a_yaml
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    83f9d66 View commit details
    Browse the repository at this point in the history
  177. Rename lm_eval.logging -> lm_eval.loggers (EleutherAI#1858)

    * rename lm_eval.logging module
    
    * fix evaluation tracker args
    haileyschoelkopf authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    fe6fb1a View commit details
    Browse the repository at this point in the history
  178. Updated vllm imports in vllm_causallms.py (EleutherAI#1890)

    * Reorder vllm imports in vllm_causallms.py
    
    * Update vllm_causallms.py
    mgoin authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b69aecc View commit details
    Browse the repository at this point in the history
  179. [HFLM]Add support for Ascend NPU (EleutherAI#1886)

    * [HFLM]Add support for Ascend NPU
    
    Co-authored-by: jiaqiw09 <[email protected]>
    Co-authored-by: zhabuye <[email protected]>
    
    * bump accelerate dependency version to 0.26.0 for NPU compat.
    
    ---------
    
    Co-authored-by: jiaqiw09 <[email protected]>
    Co-authored-by: zhabuye <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    4 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d177975 View commit details
    Browse the repository at this point in the history
  180. higher_is_better tickers in output table (EleutherAI#1893)

    * Higher is better tickers in output table
    
    * add extra check for `higher_is_better` not being None already
    
    * Update lm_eval/evaluator.py
    
    * fixup format I messed up
    
    * add comment (and retrigger tests)
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    bbc1216 View commit details
    Browse the repository at this point in the history
  181. Add dataset card when pushing to HF hub (EleutherAI#1898)

    * dataset card initial
    
    * few fixes
    
    * adds groups for math, mmlu, gpqa
    
    * added summary agrs
    
    * moved sanitize_list to utils
    
    * readme update
    
    * recreate metadata moved
    
    * multiple model support
    
    * results latest split fix
    
    * readme update and small refactor
    
    * fix grouping
    
    * add comments
    
    * added pathlib
    
    * corrected pathlib approach
    
    * check whether to create a metadata card
    
    * convert posix paths to str
    
    * default hf org from token
    
    * hf token value error
    
    * Add logs after successful upload
    
    * logging updates
    
    * dataset card example in the readme
    
    ---------
    
    Co-authored-by: Nathan Habib <[email protected]>
    Co-authored-by: Alina Lozovskaia <[email protected]>
    3 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ebc3807 View commit details
    Browse the repository at this point in the history
  182. Making hardcoded few shots compatible with the chat template mechanism (

    EleutherAI#1895)
    
    * init test 1
    
    * fix
    
    * this format seems to be working - need to update all other tasks with the new format
    
    * bbh with few shot format
    
    * fix fewshot bbh
    
    * add mmlu flan cot
    
    * samples of cot
    
    * kmmlu
    
    * fix gsm8k
    
    * update keys for mmlu
    
    * minerva math
    
    * bbh
    
    * fix
    
    * fix samples
    
    * small fixes to templates
    
    * last prompt format change
    
    * fixing prompt
    
    * fixed minerva math format
    
    * rm accidental commited file
    
    * added doc for few shot samples
    
    * Update lm_eval/loggers/evaluation_tracker.py
    
    * Update lm_eval/loggers/evaluation_tracker.py
    
    * Update docs/new_task_guide.md
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * added check in sampler per code review
    
    * added the system from a function, plus an example in minerva math
    
    * style
    
    * Apply suggestions from code review
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * fix unit tests 1
    
    * forcing use of test split
    
    ---------
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    2 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    105b516 View commit details
    Browse the repository at this point in the history
  183. Configuration menu
    Copy the full SHA
    acc4029 View commit details
    Browse the repository at this point in the history
  184. Configuration menu
    Copy the full SHA
    e53f271 View commit details
    Browse the repository at this point in the history
  185. Complete task list from pr 1727 (EleutherAI#1901)

    * added tasks and task family descriptors
    
    * continue work on task list w/ links; slightly reorganize README
    
    * Apply suggestions from code review
    
    * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder
    
    * Update new_task_guide.md
    
    * Update README.md
    
    * run linter
    
    * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs
    
    * fix typo
    
    * Apply suggestions from code review
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    * apply format
    
    ---------
    
    Co-authored-by: Harish Vadaparty <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    4 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    85550b3 View commit details
    Browse the repository at this point in the history
  186. Add chat template (EleutherAI#1873)

    * initial chat template
    
    * tokenizer attribute check
    
    * variable rename
    
    * interface update
    
    * system instruction
    
    * system inst default update
    
    * fewshot as multiturn
    
    * typing update
    
    * indent update
    
    * added comments
    
    * Adding a fewshot in a more readable way
    
    * linting
    
    * Moved apply chat template to LM
    
    * multiturn alternation fix
    
    * cache key update
    
    * apply chat template method fix
    
    * add system prompt hash to cache_key
    
    * tokenizer name property for cache_key
    
    * property name fix
    
    * linting backward compatibility fix
    
    * docs and errors update
    
    * add documentation on adding chat template compatibility to model_guide
    
    * fewshot as multiturn check fix
    
    * saving system inst and chat template in results
    
    * eval tracker update
    
    * docs update
    
    * Apply suggestions from code review
    
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    
    ---------
    
    Co-authored-by: haileyschoelkopf <[email protected]>
    Co-authored-by: Clémentine Fourrier <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    4 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    0f995d9 View commit details
    Browse the repository at this point in the history
  187. Multiple Choice Questions and Large Languages Models: A Case Study wi…

    …th Fictional Medical Data (EleutherAI#1867)
    
    * glianorex tasks
    
    * Create README.md
    
    * Update README.md
    
    * Update README.md
    
    * fix formatting
    
    * fix internal formatting
    maximegmd authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    aceb0ce View commit details
    Browse the repository at this point in the history
  188. Modify pre-commit hook to check merge conflicts accidentally committe…

    …d not at current merge commit (EleutherAI#1927)
    LSinev authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    55c36de View commit details
    Browse the repository at this point in the history
  189. Configuration menu
    Copy the full SHA
    2d1ffb9 View commit details
    Browse the repository at this point in the history
  190. Add new Lambada translations (EleutherAI#1897)

    * added tasks and task family descriptors
    
    * configs for the new lambada translations
    
    * continue work on task list w/ links; slightly reorganize README
    
    * Apply suggestions from code review
    
    * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder
    
    * Update new_task_guide.md
    
    * Update README.md
    
    * run linter
    
    * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs
    
    * fix typo
    
    * update `lm_eval/tasks/README.md` with task description
    
    ---------
    
    Co-authored-by: Harish Vadaparty <[email protected]>
    Co-authored-by: anthony <[email protected]>
    Co-authored-by: Hailey Schoelkopf <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    5 people authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c63d56a View commit details
    Browse the repository at this point in the history
  191. Implement NoticIA (EleutherAI#1912)

    * Noticia
    
    * test
    
    * Final testes implementation
    
    * Fixes
    
    * Fix linters
    ikergarcia1996 authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    17fcd25 View commit details
    Browse the repository at this point in the history
  192. Configuration menu
    Copy the full SHA
    58264ac View commit details
    Browse the repository at this point in the history
  193. Configuration menu
    Copy the full SHA
    66e2c9d View commit details
    Browse the repository at this point in the history
  194. Update basque-glue (EleutherAI#1913)

    * Update README.md
    
    * Update bec.yaml
    
    * Update bhtc.yaml
    
    * Update coref.yaml
    
    * Update qnli.yaml
    
    * Update vaxx.yaml
    
    * Update wic.yaml
    zhabuye authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    1865671 View commit details
    Browse the repository at this point in the history
  195. Test output table layout consistency (EleutherAI#1916)

    * sort metrics in output table
    
    * update docstring in `consolidate_results`
    
    * add tests for verifying consistency of table output
    
    * update tests to account for floating point inconsistencies
    
    * updated tests based on `pythia-14m`
    zafstojano authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    eaf6696 View commit details
    Browse the repository at this point in the history
  196. polqa

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    a0c1aeb View commit details
    Browse the repository at this point in the history
  197. update polish benchmarks

    chrisociepa authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    4320c18 View commit details
    Browse the repository at this point in the history
  198. update polish benchmarks

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ff41506 View commit details
    Browse the repository at this point in the history
  199. Add task definitions: 8tags, dyk, ppc, psc, belebele PL (regex), pole…

    …mo2 (multiple_choice)
    chrisociepa authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    15950dd View commit details
    Browse the repository at this point in the history
  200. task definitions fixes

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    a107ca9 View commit details
    Browse the repository at this point in the history
  201. Polish benchmark

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6b8e7b3 View commit details
    Browse the repository at this point in the history
  202. Configuration menu
    Copy the full SHA
    8568c6e View commit details
    Browse the repository at this point in the history
  203. Configuration menu
    Copy the full SHA
    ca605fd View commit details
    Browse the repository at this point in the history
  204. Configuration menu
    Copy the full SHA
    18e618e View commit details
    Browse the repository at this point in the history
  205. update polish benchmarks

    chrisociepa authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    76a4f36 View commit details
    Browse the repository at this point in the history
  206. update polish benchmarks

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8f0d25c View commit details
    Browse the repository at this point in the history
  207. feat: add the PoQuAD dataset

    kacpermilan authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    4972634 View commit details
    Browse the repository at this point in the history
  208. fix: tune the open-book prompt

    kacpermilan authored and djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6c4b0a1 View commit details
    Browse the repository at this point in the history
  209. fix psc regex

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d880314 View commit details
    Browse the repository at this point in the history
  210. fix poquad

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f876552 View commit details
    Browse the repository at this point in the history
  211. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    02dd644 View commit details
    Browse the repository at this point in the history
  212. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    637afd1 View commit details
    Browse the repository at this point in the history
  213. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    53039c2 View commit details
    Browse the repository at this point in the history
  214. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    6d5e657 View commit details
    Browse the repository at this point in the history
  215. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    38e954a View commit details
    Browse the repository at this point in the history
  216. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    88e0034 View commit details
    Browse the repository at this point in the history
  217. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    458cdc7 View commit details
    Browse the repository at this point in the history
  218. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    fea7b68 View commit details
    Browse the repository at this point in the history
  219. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2693979 View commit details
    Browse the repository at this point in the history
  220. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    776a4d3 View commit details
    Browse the repository at this point in the history
  221. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    53f1dfc View commit details
    Browse the repository at this point in the history
  222. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    87b2160 View commit details
    Browse the repository at this point in the history
  223. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8998552 View commit details
    Browse the repository at this point in the history
  224. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    b6c4ac3 View commit details
    Browse the repository at this point in the history
  225. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f9ca054 View commit details
    Browse the repository at this point in the history
  226. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    29959a3 View commit details
    Browse the repository at this point in the history
  227. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    aaaac9d View commit details
    Browse the repository at this point in the history
  228. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    db195a2 View commit details
    Browse the repository at this point in the history
  229. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    d96cd84 View commit details
    Browse the repository at this point in the history
  230. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    2df424f View commit details
    Browse the repository at this point in the history
  231. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    badffa9 View commit details
    Browse the repository at this point in the history
  232. polish eq-bench

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    c3a1bec View commit details
    Browse the repository at this point in the history
  233. fgd

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    1ca2260 View commit details
    Browse the repository at this point in the history
  234. fgd

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    ee8f8be View commit details
    Browse the repository at this point in the history
  235. generate until <|im_end|>

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    5d61f54 View commit details
    Browse the repository at this point in the history
  236. powuad; pes; hash fix

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    10a79a2 View commit details
    Browse the repository at this point in the history
  237. fix multiple choice openai

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8819b64 View commit details
    Browse the repository at this point in the history
  238. fix multiple choice openai

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    45f6010 View commit details
    Browse the repository at this point in the history
  239. fix multiple choice openai

    djstrong committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    8974fc2 View commit details
    Browse the repository at this point in the history

Commits on Aug 13, 2024

  1. fix belebele

    djstrong committed Aug 13, 2024
    Configuration menu
    Copy the full SHA
    0bea423 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2024

  1. polish pes split

    djstrong committed Aug 22, 2024
    Configuration menu
    Copy the full SHA
    21d0ea9 View commit details
    Browse the repository at this point in the history