Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Logging for predict_real_time API in Cloud Predictors #106

Open
tonyhoo opened this issue Apr 3, 2024 · 0 comments
Open

Enhance Logging for predict_real_time API in Cloud Predictors #106

tonyhoo opened this issue Apr 3, 2024 · 0 comments

Comments

@tonyhoo
Copy link
Contributor

tonyhoo commented Apr 3, 2024

Description

The predict_real_time API currently provides minimal logging information, especially in failure scenarios when using SageMaker endpoints. This lack of detailed logging makes it challenging to diagnose issues or understand the reasons behind failed inference requests. Additionally, the logs available from the cloud endpoint, such as SageMaker, offer limited information beyond HTTP error codes. This situation necessitates improved logging mechanisms to provide users with better visibility and debugging capabilities for their cloud predictors.

Expected Behavior

When an inference request fails, detailed error messages or logs should be provided to the user, including but not limited to:

  • The specific reason for the failure (e.g., model loading issues, data serialization/deserialization problems, etc.).
  • Relevant HTTP error codes along with their descriptions.
  • Suggestions or references for troubleshooting common issues.

Actual Behavior

Currently, the predict_real_time API outputs minimal information in both stdout/stderr and the cloud endpoint logs, mainly limited to HTTP status codes without detailed explanations or context. This minimal feedback loop hinders effective troubleshooting and root cause analysis.

Steps to Reproduce

  1. Set up a cloud predictor using the AutoGluon-Cloud with a SageMaker endpoint.
  2. Attempt to make an inference request using the predict_real_time API with a setup that is known to fail (e.g., incorrect input format).
  3. Observe the lack of detailed logging information in the event of a failure.

Possible Solution

Implement enhanced logging within the predict_real_time API to capture and relay detailed error messages and diagnostics information from the underlying cloud service (e.g., SageMaker). This could include:

  • Catching exceptions at the API level and enriching them with additional context before re-throwing or logging.
  • Enabling configurable log levels for the API, allowing users to opt-in for more verbose logging based on their debugging needs.
  • Working closely with cloud service providers to ensure that more detailed error information is made available and propagated through the AutoGluon-Cloud interface.

Additional Context

Enhancing the logging detail for cloud predictors not only improves the user experience by providing clear insights into the operational aspects but also significantly reduces the time spent on troubleshooting and support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant