Skip to content

Latest commit

 

History

History
123 lines (106 loc) · 5.98 KB

2019-05-27-auth.md

File metadata and controls

123 lines (106 loc) · 5.98 KB
created last updated status title authors
2019-05-27
2019-05-27
implemented
Authentication in downloads
aehlig

Abstract

This document describes a proposal to add authentication to downloads by bazel during the creation of external repositories. It is the outcome of a discusion in the bazel team, based on an earlier proposal on http downloads using .netrc and the email discussion on said proposal.

Background

Bazel uses external repositories to import dependencies that are not part of the main repository. Those external dependencies often are standard open-source libraries, but not in all cases; sometimes they are repositories private to a company or some other organisation. In the latter case, usually some form of authentication (on top of being in the right network) is required when accessing those sources.

When fetching external resources, bazel supports two fundamental mechanisms,

  • bazel can be asked to download a file, possibly with a specified hash, from one of a specified list of URLs (taking whichever is best reachable), and
  • bazel can be told to call an external program like git. In the latter case, obviously authentication can happen by whatever mechanism the external program supports. The former case, however, is often preferred, as in that way bazel's caching mechanism, that is shared between all workspaces, can be used. The API function to tell bazel to download a file is download from the repository context. Currently, it does not support authentication.

Proposal

The proposed API change is to extend the repository-context functions download and download_and_extract by an additional, optional (with default value {}) parameter auth. This parameter has to be a dict. For every URL specified to download from, if it is a key of that dict, then the value has to be a dict as well, with one entry "type" and other entries depending on the value of the the specified type. For the type "basic", the additional entries are "login" and "password" with the obvious semantics.

For example, a call telling bazel to download a file could look as follows.

ctx.download(
  urls = ["https://company-mirror.example.com/files/data.txt",
          "http://public.example.org/files/data.txt",
          "ftp://public.example.org/files/data.txt",
         ],
  sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  auth = {"https://company-mirror.example.com/files/data.txt" :
             {"type" : "basic",
              "login" : "allice",
              "password" : "thisismysecret",
              },
           "ftp://public.example.org/files/data.txt" :
             {"type" : "basic",
              "login" : "anonymous",
              "password" : "[email protected]",
              },
           })

In this example, the auth aparamter would specify, that the same file could be obtained from company-mirror.example.com via https using a proper user name and password, but also form public.example.org, either via http without authentification, or via standard anonyomous ftp.

Of course, the authentication dictionary would not be committed into the workspace; it would only be constructed by the rule in memory from some form of credential store, e.g., a .netrc file. To simplify that use case, a Starlark function will be provided to compute that dictonary from a list of URLs and a path to a .netrc file. Moreover, the version of http_archive (and related rules) that ships embedded in bazel will use functions to honor a .netrc file in the user's home direcotry, unless explicitly told to user other (including no) credentials by an explict auth dict passed as argument.

Considerations

The proposed change to the API is deliberately minimal, explicit, and agnostic to the actual mechanism the credentials are stored. The reason for this design choice are the following.

  • The .netrc file format is not the only way to store credentials on disk. By separating off parsing of the credential file and moving it to Starlark, adaptions and support for other file formats is more easy and in control of the user (without requring additional changes to the build API).

  • The approach of being very explicit about for which URLs authentication is needed, instead of having a global default through a side channel (like an option specifying a .netrc file), ensures that authentication is only used where desired. While a .netrc files specifies which credential to use for which domain, and, in general, a download providing basic auth credentials will also be successfull if none are needed, eagerly providing credentials might still be a problem. Consider a domain, say a git-hosting site, that contains both, private and public repositories. Then, accessing the private repositories will require credentials, so the domain will be in the .netrc file. The same hosting site, on the other hand, might also provide public repositories added to the workspace by rules (via the *_deps() pattern). For those, the rule author might decide to use plain http URLs (instead of https) to avoid the SSL-handshake overhead, using that the SHA256 hash of the file is enough to ensure integrity. Now, eagerly providing the available credentials will not make the download fail, but stil cause an unintended clear-text transmission.

  • By moving the logic to Starlark as far as possible, extensions are more easy. In particular, related feature requests, like URL-rewriting to go through a local mirror, can be implemented in pure Starlark and are hence out of scope of this proposal.

Backward-compatibility

If the additional parameter auth is not given, the functions download and download_and_extract behave the same way as before. So this is a fully backwards-compatible extension.