Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize encode/decode for datasets #415

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Commits on Jan 5, 2024

  1. gitignore further common output directories

    Probably most controversial here is the addition of `data/*/samples/*`.
    I was using this to save sample prompts for datasets. Happy to drop it
    if its inclusion is not desired.
    
    The other things are all common things you'd want to gitignore: venv
    directories, vs-code workspaces, output directories (using the directory
    names suggested by this codebase), and the default wandb output
    directory.
    GMNGeoffrey committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    e209273 View commit details
    Browse the repository at this point in the history
  2. Generalize encode/decode for datasets

    This fixes a TODO to allow arbitrary encoding/decoding schemes for
    different datasets. To do so, I switched from pickle to dill, which
    extends pickle to enable things like pickling functions, including
    their referenced globals. dill is already a dependency of datasets,
    so this doesn't add any new dependencies.
    GMNGeoffrey committed Jan 5, 2024
    Configuration menu
    Copy the full SHA
    fba53b8 View commit details
    Browse the repository at this point in the history