Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data monitoring feature #150

Open
momegas opened this issue Feb 28, 2023 · 1 comment
Open

Add data monitoring feature #150

momegas opened this issue Feb 28, 2023 · 1 comment
Assignees
Labels
blocked This issue is blocked enhancement New feature or request

Comments

@momegas
Copy link
Member

momegas commented Feb 28, 2023

Is your feature request related to a problem? Please describe.
The problem issue is trying to solve is that some users need to check and validate their data as part of their MLOps lifecycle. Since Whitebox already does this for the training and inference dataset, we should be able to extend this functionality to a complete data monitoring solution.

Describe the solution you'd like
A possible solution is to create a data monitoring project, just like we do in model monitoring. The user should be able to specify where the data is located (S3, SQL, and other integrations in the future) and whitebox will run the data monitoring pipelines just it does with model monitoring.

A possible flow is the following:

  1. Create a data monitoring project (through SDK/ UI/ API)
  2. Choose the data to be monitored by specifying the data source and credentials.
  3. Run the data monitoring pipelines and display the findings on the dashboard (like model monitoring)
@momegas momegas converted this from a draft issue Feb 28, 2023
@momegas momegas added the enhancement New feature or request label Feb 28, 2023
@momegas momegas moved this from Backlog to Planned in 🐻‍❄️ Whitebox - Issue tracking Feb 28, 2023
@NickNtamp
Copy link
Contributor

NickNtamp commented Mar 3, 2023

@momegas what do you mean data monitoring pipelines? Is any of the existing pipelines of whitebox a data monitoring one? Do we have to create some data monitoring pipelines? If yes, then a discussion is needed.

@NickNtamp NickNtamp added the blocked This issue is blocked label Mar 6, 2023
@NickNtamp NickNtamp self-assigned this Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This issue is blocked enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants