Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize encode/decode for datasets #415

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

GMNGeoffrey
Copy link

This fixes a TODO to allow arbitrary encoding/decoding schemes for
different datasets. To do so, I switched from pickle to dill, which
extends pickle to enable things like pickling functions, including
their referenced globals. dill is already a dependency of datasets,
so this doesn't add any new dependencies.

This PR also includes some gitignore additions that I found
necessary for my usage. I can alter the entries, remove it from this
PR, or break it into a separate PR, as you prefer. Probably the most
controversial addition would be data/*/samples/*, since that's not
a format that is currently referenced in this codebase. I was using
directories like that to save sample prompts for datasets. Happy to
drop it if its inclusion is not desired.

Probably most controversial here is the addition of `data/*/samples/*`.
I was using this to save sample prompts for datasets. Happy to drop it
if its inclusion is not desired.

The other things are all common things you'd want to gitignore: venv
directories, vs-code workspaces, output directories (using the directory
names suggested by this codebase), and the default wandb output
directory.
This fixes a TODO to allow arbitrary encoding/decoding schemes for
different datasets. To do so, I switched from pickle to dill, which
extends pickle to enable things like pickling functions, including
their referenced globals. dill is already a dependency of datasets,
so this doesn't add any new dependencies.
@AutomaticHourglass
Copy link

I also did a similar thing on my personal work, recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants