Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment went through even though image wasn't build/pushed properly #1015

Open
elderapo opened this issue Mar 1, 2024 · 3 comments
Open
Labels
impact/reliability Something that feels unreliable or flaky kind/bug Some behavior is incorrect or out of spec

Comments

@elderapo
Copy link

elderapo commented Mar 1, 2024

What happened?

I recently upgraded @pulumi/docker from v3 to v4. Everything seemed okay until a couple of hours ago.

It appears that pulumi docker either failed to build the image (got stuck at some step?, I extensively use docker image multi-step builds), failed to push it to the docker registry but still successfully went through preview (buildOnPreview was set to true) and then deployed changes to the k8s cluster. What ended up happening is a broken deployment where some pods (not all) had their images set to <none>@<none>.

2024-02-29T23:24:02.0491260Z  ~  kubernetes:apps/v1:Deployment microservice-***** updating (73s) warning: [Pod *****--staging/microservice-*****-6c9485c5c6-hg84z]: containers with unready status: [microservice-*****] -- [InvalidImageName] Failed to apply default image tag "<none>@<none>": couldn't parse image reference "<none>@<none>": invalid reference format
...
2024-02-29T23:32:56.1344667Z  ~  kubernetes:apps/v1:Deployment microservice-***** updating (603s) error: 3 errors occurred:
2024-02-29T23:32:56.1346837Z  ~  kubernetes:apps/v1:Deployment microservice-***** **updating failed** error: 3 errors occurred:
2024-02-29T23:32:56.1348312Z @ Updating.......
2024-02-29T23:32:56.1349667Z     pulumi:pulumi:Stack *****-*****--staging running error: update failed
2024-02-29T23:32:56.1351425Z     pulumi:pulumi:Stack *****-*****--staging **failed** 1 error
2024-02-29T23:32:56.1352644Z Diagnostics:
2024-02-29T23:32:56.1353688Z   kubernetes:apps/v1:Deployment (microservice-*****):
2024-02-29T23:32:56.1354806Z     error: 3 errors occurred:
2024-02-29T23:32:56.1357230Z     	* the Kubernetes API server reported that "*****--staging/microservice-*****" failed to fully initialize or become live: 'microservice-*****' timed out waiting to be Ready
2024-02-29T23:32:56.1360619Z     	* Minimum number of Pods to consider the application live was not attained
2024-02-29T23:32:56.1364341Z     	* [Pod *****--staging/microservice-*****-6c9485c5c6-hg84z]: containers with unready status: [microservice-*****] -- [InvalidImageName] Failed to apply default image tag "<none>@<none>": couldn't parse image reference "<none>@<none>": invalid reference format

Example

Unfortunately, I am unable to provide an example/reproduce the bug however there are snippets of how DockerProvider/Image instances were constructed:

const imageName = new docker.Image(
    options.name,
    {
      build: {
        context: MONOREPO_ROOT_DIRECTORY,
        dockerfile: join(MONOREPO_ROOT_DIRECTORY, "docker", `Dockerfile.production-${options.name}`),
        target: `${options.name}-release`,
        args: options.args,
        platform: "linux/amd64",
        builderVersion: "BuilderBuildKit",
      },
      imageName: `${dockerRegistryConfig.host}/${dockerRegistryConfig.username}/${options.name}`,
      buildOnPreview: true,
      registry: {
        server: dockerRegistryConfig.host,
        username: dockerRegistryConfig.username,
        password: dockerRegistryConfig.password,
      },
    },
    {
      provider: options.appEnvironment.dockerProvider,
    },
  ).repoDigest;
  const dockerProvider = new DockerProvider("docker-provider", {
    host: env.get("DOCKER_HOST").asString(),
    // prettier-ignore
    sshOpts: [
      "-o", "StrictHostKeyChecking=no",
      "-o", "UserKnownHostsFile=/dev/null",

      "-o", "ControlMaster=auto",
      "-o", "ControlPath=~/.ssh/control-%C",
      "-o", "ControlPersist=yes",
  ],
    registryAuth: [
      {
        address: dockerRegistryConfig.host,
        username: dockerRegistryConfig.username,
        password: dockerRegistryConfig.password,
      },
    ],
  });

Output of pulumi about

I am unable to get output of pulumi about because the issue in question occurred in github action (and I no longer access the VM instance).

Pulumi version was 3.107.0, installed through pulumi/actions@v5 (SHA:76683de37aa44910871ba6cef36557780f2e41d1)
OS: Ubuntu 22.04

Additional context

I suspect the issue might've been caused by temporary network issues between "dedicated docker image builder server" and github action VM (where Pulumi is executed).

@elderapo elderapo added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Mar 1, 2024
@mjeffryes
Copy link
Member

Thanks for the bug report @elderapo. We'll keep an eye out for this to see if we can track it down. In the meantime, if you do find a consistent reproduction, please let us know!

@mjeffryes mjeffryes added impact/reliability Something that feels unreliable or flaky and removed needs-triage Needs attention from the triage team labels Mar 1, 2024
@alfred-stokespace
Copy link

This is hitting me as well in GHE Workflow Action actions/pulumi-actions@v4

    "dependencies": {
        "@pulumi/aws": "^6.0.0",
        "@pulumi/awsx": "^2.0.2",
        "@pulumi/pulumi": "^3.113.0",
        "typescript": "^5.0.0"
    }

For me v4 tag is tagged this Jun 5, 2023 commit/4204b4e8a7e703da96ba5dd4c3a667adeee35812 which looks to be v4.4

In my case I have two new docker.Image(...) instances in the same stack, each building a different Dockerfile

I need the .repoDigest from both so I can do a follow up deploy.

But this fails intermittently as the .repoDigest is <none>@<none> which is not an acceptable input to my FargetTaskDefinition.

What's particularly odd, is that one of the new docker.Image(...) instances produces it's repoDigest correctly.

I had this happen earlier this week, but of the two images they flipped which one was <none>@<none>

The first time It happened I resorted to commenting out the declaration of the offending image, building, then, uncommenting out, then building again.

Just now, I tried...

  1. local pulumi up
       Resources:
             4 unchanged
    
    So that didn't help, pulumi stack output still has one of the two images as none@none
  2. Next, I'll try removing the assignment of the repoDigest to the exported output
       Outputs:
          - ghApiExporterImage           : "<none>@<none>"
    
    yes to that,
  3. Now, if I add it back will I get the goods?
    Nope
      Outputs:
      + ghApiExporterImage           : "<none>@<none>"
    
  4. So, now I guess the issue is not with the output but the resource? I will do what I did last time I guess and delete the object (which requires temporary refactor my code to allow return types to be undefined through a layer of contract/interface code (ie. it's a hassle)
  5. K, now up deleted the resource and output.
  6. Now I revert all the interface changes and revert the delete-of-the-docker-resource
  7. I'm going to try rerunning this in Actions now rather then locally... see what that does...
  8. It worked... I now have two proper digest urls and Fargate is happy again.

A couple thoughts... Obviously it'd be great if this just didn't break, but when it does break the fact that I have to change the code, run pulumi, revert the code, run pulumi makes for a real head ache.

I wonder, would a pinpoint state delete command, followed by a up, be the way to go here (until you fix it that is) ?

@elderapo
Copy link
Author

elderapo commented Apr 25, 2024

@alfred-stokespace If you can reproduce it on demand, please create a simple repro repo. It should help the Pulumi team to get to the bottom of this issue. It only happened to me a couple of times around the time I opened this issue.

Meanwhile, I am using this trick to prevent accidental deployments when images don't build successfully:

const image = new docker.Image(...);

const validatedImage = image.repoDigest.apply(digest => {
  if (digest === "<none>@<none>") {
    throw new ResourceError(
      `Digest(${digest}) is "<none>@<none>"! Image either failed to build or push...`,
      image,
      true,
    );
  }

  /**
   * Possibilities:
   * sha256:xxx
   * docker.io/user/repo@sha256:xxx
   */
  if (!digest.includes("sha256:")) {
    throw new ResourceError(
      `Digest(${digest}) does not include sha256 prefix! Image either failed to build or push...`,
      image,
      true,
    );
  }

  return digest;
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/reliability Something that feels unreliable or flaky kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

3 participants