Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JupyterHub: Developing a GA4GH TES Service Plugin for JupyterHub – all cells #6

Open
viktoriaas opened this issue Nov 3, 2024 · 0 comments

Comments

@viktoriaas
Copy link

viktoriaas commented Nov 3, 2024

Why?

Jupyter Notebook is an application for creating and sharing computational documents. JupyterHub is a way of providing the Notebooks to multiple users. The benefit is that users gain easy interactive access to computational resources without need to install anything.

GA4GH TES (Task Execution Service) API is a standardized schema and API for describing and executing batch execution tasks on any underlying computational backend. Full TES spec defines TES capabilities.

The goal of this issue is to develop or to lay foundations to GA4GH TES service plugin for JupyterHub that would execute all cells in the TES instance.

Objective: Build a plugin or extension within JupyterHub that allows seamless access to GA4GH TES, streamlining federated task submission. The plugin will focus on the goal of executing all notebook cells (so whole .ipynb) through TES

Scope: Focus on plugin development, installation instructions, and usage documentation so administrators can easily deploy it across ELIXIR nodes.

More useful information and link: document online

How?

This is a larger meta issue that might (should) require discussions. Here are some helping points:

Considerations:

  • Core Components
    • TES Client Library: You'll need a client library in Python (the language Jupyter notebooks use) to interact with the TES instance. This library will handle:
      • Constructing TES task requests based on notebook cell content.
      • Submitting these tasks to the TES server.
      • Monitoring task execution status.
      • Retrieving results and outputs.
    • Notebook Integration: Develop a mechanism within the Jupyter notebook environment to:
      • Identify code cells to be executed on TES. (Perhaps a magic command like %%tes or a dedicated cell tag)
      • Extract code and dependencies from these cells.
      • Package them into a format suitable for TES (e.g., Docker image).
      • Display task status and results within the notebook.
    • Workflow Creation: Treat the entire notebook as a workflow with dependencies between cells.
    • Cell Ordering: Determine the execution order of cells based on their dependencies (e.g., using cell tags, code analysis).
    • Task Chaining: The client library will create a series of TES tasks, where the output of one task becomes the input of the next.
    • State Management: Track the execution state of each cell and the overall workflow.
    • Error Handling: Implement mechanisms to handle errors in individual cells or the workflow as a whole.
  • Implementation Considerations
    • Security: Securely handle authentication and authorization to the TES instance.
    • Scalability: Design for efficient execution of large notebooks with many cells and complex dependencies.
    • Usability: Provide a user-friendly interface within the notebook for TES interaction.
    • Flexibility: Support the option to choose from multiple TES instances and allow customization of task parameters.
  • Tools and Technologies
    • TES Implementations: Funnel, TESK, TES Azure
    • Python TES Client: py-tes
    • Docker: For containerization
    • Jupyter Extensions: To enhance the notebook interface

If you want to work on this issue:

  • Assign yourself to the issue (if someone else is already assigned, first ask them if they would mind help on the issue - or pick another one)
  • Once assigned, move your issue to the "In progress" column on the project board
  • Start working 🚀
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready
Development

No branches or pull requests

2 participants