Skip to content

Commit

Permalink
improved README
Browse files Browse the repository at this point in the history
  • Loading branch information
JohannesWiesner committed Aug 30, 2024
1 parent 2a553df commit 49be7ff
Showing 1 changed file with 27 additions and 21 deletions.
48 changes: 27 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
**T**svto**C**onda**Y**ml. A package for easy creation of conda `.yml` files using a `.tsv` file as input.

## Aims
Using `.yml` files as recipes to create conda environments is already a good step towards reproducible scientific computing environments. However, sometimes we want to know **why** a particular package was included (or not), **what** it does (improving transparency), and whether it runs without errors on all common operating systems (Linux, Mac OS, Windows). Spreadsheet files offer much more possibilities to document this. The goal of this repository is to have the documentation capabilities of a `.tsv` file and then be able to export the packages that are described in it to a `.yml` file.
Using `.yml` files as recipes to create conda environments is already a good step towards reproducible scientific computing environments. However, sometimes we want to know **why** a particular package was included (or not), **what** it does (improving transparency), and whether it runs without errors on all common operating systems (Linux, Mac OS, Windows). Spreadsheet files offer much more possibilities to document this. The goal of this repository is to have the documentation capabilities of a `.tsv` file while also be able to export the packages that are described in it to a `.yml` file that can be interpreted by conda.

## Use this repository for your own work

The most easy way to use tcy is to create your own repository by using this repository as a template. This has two advantages over using tcy locally on your machine:

1. Your "recipes" will be stored in a Github repository and are therefore available from any machine as long as you have an internet connection.
2. Creating environments can take a lot of time, depending on the number of packages that need to be included. Using the following approach, the computionally heavy solving process is outsourced to a Github-Runner so your personal machine can be used for other things.
1. Your spreadsheet files will be stored in a Github repository and are therefore available from any machine as long as you have an internet connection.
2. Creating environments can take a lot of time, depending on the number of packages that need to be included. Using the following approach, the computionally heavy solving process is outsourced to a Github-Runner so your personal machine can be used for other things. The Github-Runner will output solved environment specification files that you can use to quickly create your environment.

If you want to use this approach, then follow these steps:

Expand All @@ -19,9 +19,9 @@ If you want to use this approach, then follow these steps:

3. Clone your repository to your local machine

4. Make local changes to `environments/packages.tsv`
4. Make local changes to `environments/packages.tsv`. See next section [What goes into the packages.tsv file?](#What-goes-into-the-packages.tsv-file?) on how to properly fill out this file.

5. Push your changes. This will start a Github-Action-Worfklow (that uses tcy and micromamba) to create `.yml` files with solved package specification solutions. The workflow will automatically push the files to your repo, so wait until it's finished.
5. Push your changes. This will start a Github-Action-Worfklow (that uses tcy and micromamba) to create `.yml` files with solved package specification solutions. Note that this workflow will also check if you filled out the file correctly by using (see [Automatic testing of the packages.tsv file](#Automatic-testing-of-the-packages.tsv-file) for more information). The workflow will automatically push the files to your repo, so wait until it's finished.

6. After the workflow has finished, pull the latest changes to your local repository.

Expand All @@ -31,15 +31,31 @@ If you want to use this approach, then follow these steps:
* After that execute the following command to create your environment: `conda env create -f ubuntu_latest_solved.yml` (or `conda env create -f windows_latest_solved.yml`)
(Note: There is no need to specify `-n environment_name` in this command because the name of the environment was already specified in the first step. More information can be found [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file))

### What goes into the packages.tsv file?
The input spreadsheet file must have the following columns:

- `package_name` : The offical name of the package.
- `version` : Can be empty, or specify the version of the package following the [package match specification syntax](https://docs.conda.io/projects/conda-build/en/latest/resources/package-spec.html#package-match-specifications).
- `package_manager` : Must be `pip`, `conda`, or `cran`.
- `conda_channel` : Can be empty in case the package manager is `pip` or `cran` but must contain the name of the conda channel to install from if `conda` is the package manager.
- `necessity` : Must be `required`, or `optional`.
- `language` : Must be `python`, `r`, or `julia`.
- `bug_flag`: Can be empty or `linux`, `windows` or `cross_platform`.

## For developers
### How to generate a custom .yml file using tcy

tcy can be pip-installed using `pip install tcy`. There are two ways to use tcy:
There are two ways to use tcy that both require the same installation using `pip install tcy`. After this you can use tcy as a python library or command-line application:

#### Using tcy as a python library

1. You can import the `run` function in your own code-base using `from tcy import run`.
2. tcy can also be used as a command-line application by simply running `tcy` in the terminal.
You can import the `run` function in your own code base using `from tcy import run`.

The following **positional arguments** have to be specified in both cases:
#### Using tcy as a command line application

As a command-line application: tcy can also be used as a command-line application by simply running `tcy` in the terminal.

The following **positional arguments** have to be specified:

- `{linux,windows}` (Operating system under which the `.yml` file will be used to create a conda environment. Can be 'linux' or 'windows'. Depending on the input only packages that run bug-free under the specified OS are selected. Packages that are flagged with `cross-platform` in the `bug_flag` column of the input `.tsv` file are never included.

Expand All @@ -56,17 +72,7 @@ The following **optional arguments** can be set for further customization:
- `--languages` (Filter for certain languages. Valid inputs are 'python', 'r', 'julia' or 'all'. The default is 'all')
- `--necessity` (Filter for necessity. Valid inputs are 'optional' and 'required').

### The packages.tsv file
The input spreadsheet file needs to have the following columns:
- `package_name` (the offical name of the package)
- `version` (specify the version of the package you need by following the [package match specification syntax](https://docs.conda.io/projects/conda-build/en/latest/resources/package-spec.html#package-match-specifications))
- `package_manager` (can be 'pip', 'conda', or 'cran')
- `conda_channel` (which conda channel to install from)
- `necessity` (can be 'required', 'optional')
- `language` (python, r, julia)
- `bug_flag` (can be 'linux','windows' or 'cross_platform')

### Automatic testing of the datasets.tsv file
### Automatic testing of the packages.tsv file

This repository includes a testing pipeline that checks for the integrity of / valid entries in the `packages.tsv`. Which tests are running is decided using the `test_configs.json` file. Each tests corresponds to a key within the `json` file. If the corresponding value is `null` the test is not being executed. Here's an explanation of each test and rules for how the values should be provided in case the test should be executed.

Expand All @@ -87,7 +93,7 @@ Some R-packages are not (yet) available as conda-packages. In order to semi-auto

### What about dependencies?

It's not necesary to specifiy dependencies in the `.tsv` file! Conda will take care of that. So for example, there's no need to put `numpy` in the `.tsv` file because `numpy` is a common dependency of most scientific python packages (e.g. `scikit-learn`,`pytorch`, etc.) There might however be cases where there are optional dependencies that can but do not have to be installed (Example: The plotting package `plotly` works completely fine if we install it as it is. But if we want the nice feature of creating interactive plots we also have to install the dependency `orca`). Optional dependencies should be marked as `dependency` in the `area` column of the `.tsv` file.
It's not necesary to specifiy dependencies in the `.tsv` file! Conda will take care of that. So for example, there's no need to put `numpy` in the `.tsv` file because `numpy` is a common dependency of most scientific python packages (e.g. `scikit-learn`,`pytorch`, etc.) There might however be cases where there are optional dependencies that can but do not have to be installed (Example: The plotting package `plotly` works completely fine if we install it as it is. But if we want the nice feature of creating interactive plots we also have to install the dependency `orca`).

### Why not create the environment and share the exported .yml file?
Theoretically there would be an even better option than everyone creating the same environment over and over: The environment should be only created once (which can take a long time because conda has to resolve a dependency graph where each of the packages is *‘happy’* with the versions of all other packages). Then this environment could be exported via `conda env export > environment.yml` . Finally, other users could then take this `.yml` file to create the environment without the need to resolve the dependency graph one more time, because this file already contains the ‘solution’. More information on that can be found [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#exporting-the-environment-yml-file).
Expand Down

0 comments on commit 49be7ff

Please sign in to comment.