-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unconference tutorial: Escaping dependency hell -- Docker for reproducible research? #11
Comments
I think this is a great idea for a session, and completely hits on all the advantages of containers. We might want to talk a bit about creating small, modular application components, as well. I can certainly see the power of RStudio and ipython approaches to reproducible analysis, but there will be users who want to write their analysis pipelines in a unix environment using simple commandline tools. There will also be researchers that have written a complex application (c, perl, python, java, etc.) that they want others to use. We could demonstrate to these users how to create a very thin wrapper image, with just the base OS, required packages, interpreter/compiler, and libaries, and the application that they can share with anyone and allow them to run the application without all the fuss that you mention. I have some very simple docker images in both github and the docker registry that demonstrate a reproducible, reusable, and extensible next gen sequencing analysis based on bwa and samtools. We could then talk about how to extend one of these into an RDesktop or ipython extension image that allows these applications to be used inside them if you want. |
@dmlond Yup, I completely agree that we'd want to show a custom/complex application. After all, the use case is most compelling when working across different libraries where just using a language's built-in package manager won't cut it. That said, the nice thing about the web-based consoles like RStudio or ipython-notebook is that the audience may already be familiar with one of those interfaces but not know their way around a Unix command line. Might be good to do both? I was thinking that it would be easiest to start off the tutorial just showing interactive use, and build up to writing a Dockerfile.
|
I was definitely thinking 'do both'. I think containerized RDesktop and/or iPython Notebook systems will be how many researchers access and analyze their data. My examples would definitely be in the latter parts of the tutorial, as we move more into DevOps using the Dockerfile. |
I am very interested in having Docker explained to me. Sign me up! |
👍 |
Academic research depends on a software ecosystem of ever-increasing complexity. Moreover, each researcher's software environment is unique -- make use of different tools, different libraries, and different versions. These details are rarely fully described even for the researchers themselves. This poses a substantial barrier to reproducibility.
Docker provides a 'shipping container' to easily share your software environment with others. Unlike existing solutions, Docker isn't monolithic -- use the parts you like. This has made it very successful in the world of professional software developers because they, like researchers, have developed their own favorite tools and ways of doing things and don't want to change, but still need an easy way for others to run their software.
This tutorial would introduce Docker by illustrating 4 key concepts desirable in any approach to reproducible software environments:
This would be a hands-on demo of running a 'Dockerized' environment, extending it, committing & sharing those changes. (We probably do this using RStudio, though I could also demonstrate this for ipython-notebooks or other computational environments).
The text was updated successfully, but these errors were encountered: