-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting too many failures in processingJob #32
Comments
@grossamit What are you running when you see this error? Where is the error coming from? Do you see this error all the time or just intermittently? |
@tomfaulhaber it happens intermittently . I'm running the notebook also with parameters and instance type. |
@grossamit My guess is that this is an issue with way you're routing connections from your SageMaker Processing node to your VPC. One thing would be to check that your subnet definitions are right, your security groups don't have fixed IPs, or whether there's anything else that could mess things up based on what IP address that SageMaker Processing instance is given. |
@tomfaulhaber thanks for your reply ! |
@grossamit I would expect exactly this behavior if, for example, you had a VPC with multiple subnets but only enabled the S3 endpoint for a single subnet. |
Playing with this today, we realized there's an interaction between processing jobs and VPCs that's working differently than I understood. I think we can come up with a workaround. |
Hi @tomfaulhaber
But it can't access bucket with input data:
|
Hi @tomfaulhaber , |
Any updates on this? Exact same issue |
For me, I need to run my sagemaker processing job within a VPC and within a subnet, I'm specifying the subnet and VPC like such:
|
So I think you need to create a VPC enpoint. For some reason processing jobs doesn't have access to aws internal services despite being inside your VPC/Subnet, having an ARN and role. You need to create a VPC endpoint, which is kind of like a pipe that allows aws sagemaker processing jobs direct access to specific internal services. Would probably be a good thing to add to the script, hah. |
Also experiencing this. Any updates? |
I ended up switching back to no VPC after a few tries and realized that my IAM roles were slightly off. I only had the bucket arn with ** after it when I needed to add just the bucket name with no ** after it. Like as follows:
I'll update it if I get it working with the VPC |
I got the same error, but then I remove all the network config in my processing job. And it works ! |
Any update here?, I created a S3 VPC endpoint but still giving me that error. I'm using training jobs in a isolated subnet |
@gabriel-loka |
Facing this problem. Any update ? |
I had the same error. Thanks @papierGaylard ;) |
ClientError: Failed to download data. ListObjectsV2 failed for s3://.... nextToken:[null]: Unable to execute request to S3
The thing is that sometimes it succeed and sometimes not.
I've also added a code to wait 10sec after the notebook upload and verify that the file exists after the upload with ListObjectsV2.
The text was updated successfully, but these errors were encountered: