-
Notifications
You must be signed in to change notification settings - Fork 5
DevNotes
For a general schematic of the open tree architecture, see this [architectural diagram] 1. The basic goal of this repo is provide web-service interfaces to the corpus of phylogenetic studies. The corpus is referred to as phylesystem.
The phylesystem-api provides basic read/write access and format conversion. Search functionality is supplied by oti.
Most of the business logic of dealing with the phylesystem corpus is coded in peyotl. This is nice because if facilitates easier code reuse and testing that does not require running the full web stack. This is a pain because devs will need to coordinate merges of peyotl branches and phylesystem-api branches that depend on them.
A typical series of study edit operations, as choreographed by the open tree curator app (which is running the code in the curator subdir of the opentree repo is shown below. We are in the process of moving from v1 of the API to v2, so some of the URLs used could be stale. The template configuration file holds the patterns used to construct the actual URLs used by the curator app; so you should use that if you need the exact URLs.
- request brief list of studies and metadata from oti's
findAllStudies
service - user selects a study, and curator app fetches a "NexSON with extra info" using a
GET
to phylesystem-api'sv1/study/{STUDY_ID}
. - the user corrects various deficiencies of the study, and the curator app saves these changes using a
PUT
to phylesystem-api'sv1/study/{STUDY_ID}
-
the curator app prompts the user to enter a new study from scratch or upload a file.
-
No studies "in the wild" will be in NexSON. If the user uploads data to be imported, the curator app uses its own controllers to convert the inputs to NexSON. These calls are documented in the opentree/curator README. Briefly, they are:
-
to_nexson
with the blob of input to use NCL to convert to NeXML andpeyotl
to convert the NeXML to open tree NexSON. -
If there is a previous NexSON blob associated with this study (e.g. if the user is uploading trees as separate newick tree in a series of operations), then
merge_otus
is called because the conversion of "external" sources to NexSON is not aware of previously created IDs
-
-
Alternatively, the user can create a new OT study using a tree base ID, in this case the curator app just prompt the user for that ID.
-
Alternatively, the user could just supply a DOI
-
A
POST
to phylesystem-api'sv1/study
will validate the input, create a new study ID, and return a receipt with the ID and git SHA's for the new study.
On the server side this triggers several calls to peyotl's Phylesystem wrapper. The key one's are:
- phylesystem.return_study
- phylesystem.add_validation_annotation
- phylesystem.get_version_history_for_study_id
In terms of the actions performed on the server, these steps entail.
- the phylesystem-api waits for lock on the
phylesystem
git repo - the
master
branch is checked out - the requested study is read.
- If a no cached validation annotation for the study is available, then one is generated.
- The annotation injected into the NexSON
- the version history of the study is constructed (this is where the call will be after https://github.com/OpenTreeOfLife/phylesystem-api/issues/107 is fixed).
- the
phylesystem
git repo is unlocked. - the "extra info" is added to the response JSON which will also hold the NexSON
- perform any format conversion based on the user's request and the phylesystem's native version of NexSON.
- make sure that the client sent in a valid
starting_commit_SHA
arg that will identify the parent commit for this edit. This should be the commit SHA of the version of the study that was shown to the user so that his/her the history correctly reflects the lineage of files being edit. - call
peyotl.phylesystem.git_workflows.validate_and_convert_nexson
to validate the NexSON and convert it to the version of NexSON syntax that is being used by the phylesystem-api. - call
peyotl.phylesystem.annotate_and_write
to write make the new commit. - If the commit can be merged to master (which hopefully will be almost all the time - the only exceptions should be if 2 users are editing the same study at the same time. In that case the first PUT should be merge-able, but the second will not be), then a deferred "push to github" call will be spawned.
- return the info about the commit.
- If a NexSON blob and ID are posted (typically just used for importing from phylografter), this will be validated using
peyotl.phylesystem.git_workflows.validate_and_convert_nexson
- If
import_method == "import-method-TREEBASE_ID"
,peyotl.external.import_nexson_from_treebase
is used to convert to NexSON - If
import_method == "import-method-PUBLICATION_DOI" || import_method == "import-method-PUBLICATION_REFERENCE"
then a shell of a NexSON is created and the publication fields are filled in based on calls to cross-ref. - If none of the previous conditions hold, a empty shell of a NexSON is created.
-
peyotl.phylesystem.ingest_new_study
is called to write the new study and commit it on the master branch of phylesystem. - a deferred "push to github" call will be spawned
- return the info about the commit.
Calls to push the master branch to GitHub can be sluggish, and we don't want the working phylesystem repo to be locked for the duration of such calls. If the system is working correctly, we don't even want the client have to worry about that - they should just get response when their changes are safely committed.
We do this by having the write operations spawn a thread using a celery.task using redis to transmit message.
The deferred task is just a call to the push service. The code for the deferred call is the call_http_json
in phylesystem-api/ot-celery/open_tree_tasks.py
The service that is called is a PUT to phylesystem-api's v1/push
which delegate's the call to peyotl.phylesystem.push_study_to_remote('GitHubRemote', study_id)
.
That call:
- fetches changes from the working repo to the push-mirror
- merges the working/master with push-mirror/master
- tries to push the push-mirror/master branch to GitHub's master (a push-mirror is used because only step 1 in this cascade needs to lock the working repo)
If you cd to the ws-tests
directory, you can configure the tests.conf
file to have the api_host set to the server that you want to test. The file also has SHAs that can act as parent SHAs for commits in the testing framework. You need to make sure that the one that is uncommented is the one that corresponds
to the phylesystem shard repo that you are using (the one that is on the server you are testing).
If all of that is working then:
$ bash run_tests.sh
will run all of the tests that are expected to pass. You can run any single test with:
$ python test_[rest-of-test-name-here].py
Double check which repo your local phylesystem-api is using, when you are in the ws-tests
dir:
cat $(cat ../private/config | grep repo_parent | awk '{print $3}')/phylesystem-1/.git/config | grep url