Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible research and data cleaning #17

Open
tracykteal opened this issue Dec 3, 2014 · 3 comments
Open

Reproducible research and data cleaning #17

tracykteal opened this issue Dec 3, 2014 · 3 comments

Comments

@tracykteal
Copy link

A step in many researchers workflow is data cleaning - taking data from public repositories or their own lab output and cleaning it for use in an analysis. Being able to track how that data was cleaned is an important part of making the research reproducible, but there aren't currently many 'how to's' on this process or the importance of this step. It would be interesting to discuss including a module on data cleaning in a reproducible research workshop, or developing one that we can point to on line.

One example would be a module for using OpenRefine reproducibly.

@jennybc
Copy link
Member

jennybc commented Dec 3, 2014

I have previously tweeted about this.

Why so few tutorials on data cleaning or Windows scientific S/W installs? Once you're done, only a masochist would sit down and write it up.

— Jennifer Bryan (@jennybryan) November 11, 2014

@kbroman
Copy link
Contributor

kbroman commented Dec 4, 2014

One of my first papers was about data cleaning, a topic close to my heart.

I completely agree that it's important to make this part reproducible. And for data cleaning it's particularly important to capture motivation (the why and not just the what). For example, the results may be completely reproducible, but why did you remove subject A and not subject B?

@tracykteal
Copy link
Author

Great, and thanks for the link to the paper @kbroman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants