Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve indexing of Zephyr #279

Open
tpetazzoni opened this issue Apr 5, 2024 · 7 comments
Open

Improve indexing of Zephyr #279

tpetazzoni opened this issue Apr 5, 2024 · 7 comments

Comments

@tpetazzoni
Copy link

We currently index the Zephyr project by indexing the Git repo at https://github.com/zephyrproject-rtos/zephyr in Elixir. However, this repo is in fact not standalone: it has references to several other projects. The correct way to get Zephyr is to use "west", which processes https://github.com/zephyrproject-rtos/zephyr/blob/main/west.yml to retrieve all the right repos.

To give an example, the driver drivers/gpio/gpio_nrfx.c in Zephyr itself calls the function nrfx_gpiote_channel_get(), which is nowhere to be found in the Zephyr repo itself. It is in fact implemented in the hal_nordic repo, which gets pulled as a module by west (see https://github.com/zephyrproject-rtos/hal_nordic).

@fstachura
Copy link
Collaborator

fstachura commented Aug 21, 2024

I did some research on this, and the main problem here, is that Elixir currently only understands git repositories. Meanwhile from my understanding, Zephyr repository with all the modules is not a git repository (or at least I don't see a good way to reliably turn it into that), but multiple repositories in a directory. West also aims to not be just git submodule, see this - modules are supposed to be outside of the main zephyr repo and that's a deliberate design choice.

What I meant by "Elixir only understands git" is:

  • git ls-tree -r is used to discover files to index
  • git cat-file is used to retrieve files before formatting
  • Files are sometimes addressed not using paths, but using git blob ids
  • Repository is never checked out, working tree is not used, update script just iterates over blobs from a particular tag
  • utils/update-elixir-data always does just a git-fetch, meanwhile west has it's own subcommand for updates

I will try to see how much of that can be modified using per-project scripts (see script.sh and projects directory).
Some of that functionality can maybe be replaced using that, but the last two points could remain a problem.
Maybe it would be a good idea to refactor the the repository handling part a bit and unify project-related conifgs (HTML fliters, tag filters and now repository backend).

Another problem is that it seems that all Elixir Zephyr URLs will break - by default west puts all modules outside of Zephyr directory, so old Zephyr paths would have to be prefixed with Zephyr directory name.

@tleb
Copy link
Member

tleb commented Aug 22, 2024

Old kernel versions (pre-Git) are handled using a specific Git repo crafted by Bootlin. Could we craft a Git repo with all the code from all Zephyr "submodules"? We would craft it automatically each time a new release is generated. Then Elixir would pick up that repo's new Git tag and index that.

It could be the prod server doing the work, with a local Git repo not being pushed anywhere (the goal is to avoid potentially having a separate script running that could fail independently). We would only have to let each project optionally deal with its "fetching" step. Then the following Zephyr logic would all be implemented in there.

The first run would be something like:

git clone https://github.com/zephyrproject-rtos/zephyr zephyr-upstream
cd zephyr-upstream
latest_tag="$(git tag | tail -n1)"  # probably wrong
git checkout "$latest_tag"
west ...
cd ..
cp zephyr-upstream zephyr-dup # TODO: ignore .git directory
cd zephyr-dup
find . -name .gitignore -print0 | xargs -0 rm
git init
git remote add ...
git commit -m "Release $latest_tag"
git tag "$latest_tag"
git push ...

Then to refresh:

git -C zephyr-upstream fetch --tags
latest_tag="$(git -C zephyr-upstream tags | tail -n1)"
latest_indexed_tag="$(git -C zephyr-dup tags | tail -n1)"
if test "$latest_tag" = "$latest_indexed_tag"; then
    exit 0
fi
cd zephyr-upstream
west ...
cd ..
cp zephyr-upstream zephyr-dup # TODO: ignore .git directory
cd zephyr-dup
find . -name .gitignore -print0 | xargs -0 rm
git commit -m "Release $latest_tag"
git tag "$latest_tag"
git push ...

I know this looks crappy. But it must be compared to adding support to Elixir. Adding support for Zephyr/west stuff looks really error prone as all over the code we use Git blobs as atoms of computation.


Edit: or it could be done in the same repo, on a different branch.

@fstachura
Copy link
Collaborator

Could we craft a Git repo with all the code from all Zephyr "submodules"? We would craft it automatically each time a new release is generated. Then Elixir would pick up that repo's new Git tag and index that.

That's a very interesting idea and I think it would totally work.

Adding support for Zephyr/west stuff looks really error prone as all over the code we use Git blobs as atoms of computation

I think it shouldn't be that bad in the end, all git operations could be abstracted away allowing replaceable repository backends. Elixir only really needs a list of tags, files, and a way to get a file by version+path or blob id.
So, for the west backend, git blob ids in the database would become any identifier that allows to reliably identify a file for that particular backend, etc. (for west say repo path from "workspace" directory and git blob id).
And abstracting this away maybe would make adding support for mercurial/svn/submodules in the future easier.

So yes, I guess we can go the fast route, or the slow route.

@tleb
Copy link
Member

tleb commented Aug 22, 2024

all git operations could be abstracted away allowing replaceable repository backends

That is one big leaky abstraction layer that touches at the core of Elixir's concept. I am not (yet) convinced.

@fstachura
Copy link
Collaborator

Would it be that leaky? In a sense we already have such an abstraction and it's script.sh. Git is always invoked through the script.

@tleb
Copy link
Member

tleb commented Nov 8, 2024

Any further opinion? I stick with my opinion: let's keep Elixir as-is and build a repo that contains all the source code. I really don't see an issue with that approach; at least much smaller issues than reworking all of Elixir's core to support one project.

@fstachura
Copy link
Collaborator

I agree and I would go with the repository converter path now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants