Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container fixes #826

Merged
merged 22 commits into from
Nov 20, 2024
Merged

Container fixes #826

merged 22 commits into from
Nov 20, 2024

Conversation

troglobit
Copy link
Contributor

@troglobit troglobit commented Nov 16, 2024

Description

Note

Everyone, some helpful highlighting (like this one!) has been added to containers.md and networking.md, please review. See available highlights here -> https://github.com/orgs/community/discussions/16925

Checklist

Tick relevant boxes, this PR is-a or has-a:

  • Bugfix
    • Regression tests
    • ChangeLog updates (for next release)
  • Feature
    • YANG model change => revision updated?
    • Regression tests added?
    • ChangeLog updates (for next release)
    • Documentation added?
  • Test changes
    • Checked in changed Readme.adoc (make test-spec)
    • Added new test to group Readme.adoc and yaml file
  • Code style update (formatting, renaming)
  • Refactoring (please detail in commit messages)
  • Build related changes
  • Documentation content changes
    • ChangeLog updated (for major changes)
  • Other (please describe):

@troglobit troglobit added the ci:main Build default defconfig, not minimal label Nov 16, 2024
@troglobit troglobit requested review from mattiaswal, jovatn and wkz and removed request for jovatn November 16, 2024 21:35
@troglobit troglobit self-assigned this Nov 16, 2024
@troglobit troglobit added this to the Infix v24.11 milestone Nov 16, 2024
@troglobit troglobit force-pushed the container-fixes branch 2 times, most recently from b19064e to 5afa473 Compare November 18, 2024 04:20
doc/container.md Outdated Show resolved Hide resolved
doc/networking.md Outdated Show resolved Hide resolved
@jovatn
Copy link
Contributor

jovatn commented Nov 18, 2024

I have only looked at documentation changes (container.md and networking.md). Looks great! Found two typos as commented above.

Copy link
Contributor

@mattiaswal mattiaswal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HUGE refactor, great work! 🚀
I just have some small comments.

doc/ChangeLog.md Outdated Show resolved Hide resolved
package/execd/execd.conf Outdated Show resolved Hide resolved
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mightily impressed! Really great work!

🎉 🎉 🎉

src/confd/yang/infix-interfaces.yang Outdated Show resolved Hide resolved
src/confd/yang/infix-containers.yang Outdated Show resolved Hide resolved
src/execd/execd.c Outdated Show resolved Hide resolved
board/common/rootfs/usr/sbin/container Show resolved Hide resolved
board/common/rootfs/usr/bin/text-editor Outdated Show resolved Hide resolved
test/infamy/container.py Show resolved Hide resolved
doc/container.md Outdated Show resolved Hide resolved
doc/container.md Outdated Show resolved Hide resolved
doc/container.md Outdated Show resolved Hide resolved
doc/container.md Outdated Show resolved Hide resolved
A sane interface name is at least two characters long, and in Linux the
interface name (using ip link) is at most 15 characthers long.

Signed-off-by: Joachim Wiberg <[email protected]>
Container support in Infix was released with v24.02, so this change may
unfortunately break a few use-cases out there.  Regrettable as this is,
the default behavior, including how containers are started after boot,
break other use-cases that were considered more important.

As of this commit:

 - all containers in Infix run in read-only mode, use volumes and
   mounts for persistence across reboot/stop/start/upgrade
 - all containers are now "recreated" at boot or related config changes,
   this ensures an OCI image embedded in the Infix image, /lib/oci/, is
   always used as the base for a running container

Fixes #823

Signed-off-by: Joachim Wiberg <[email protected]>
On unclean shutdowns Frr leaves a lot of per-thread message buffers in
/var/tmp/frr/<daemon>[-<instance>].<pid>/*

See https://docs.frrouting.org/en/latest/setup.html

Signed-off-by: Joachim Wiberg <[email protected]>
Running 'shred' on files stored on eMMC is pointless since the writes
are spread out over other sectors rather than overwriting the content
of the files as it was supposed to on old rotating media.

Signed-off-by: Joachim Wiberg <[email protected]>
To be able to handle container restarts, incl. restart policy, at
runtime, most of the container data lives in /var/lib/containers,
which on most systems is backed by a persistent store.

As of issue #823 we no longer keep a writable layer for containers,
nor should we cache container state across reboots, all containers
are recreated at boot.  This task cleans up any lingering state.

Signed-off-by: Joachim Wiberg <[email protected]>
 - Reduce the amount of queues: 3 -> 1
 - Simplify post hook
 - Refine execd

The resulting simplification of infix_containers_post_hook(), and
touching execd, also ensure container environment variable changes
are propagated.

Fixes #822

Signed-off-by: Joachim Wiberg <[email protected]>
 - Anonymous FTP, or URL encoded ftp://user:hostname@addr/oci.tar.gz
 - HTTP/HTTPS fetched with curl, optional credentials support
 - Verify download against an optional sha256 checksum

Ensure the unpacked directory name does not contain a ':', it is a
restricted character and cannot be part of the file name.  If this
syntax is used we retain it as the name and retag it after load.

Fix #801

Signed-off-by: Joachim Wiberg <[email protected]>
Issue #815 detail issues found running the Clixon Controllar and
Cisco Yangsuite.  The errors and warnings listed are very similar
to pyang, which the undersigned has, the following changes fixes
the pyang errors:

 - relocate 'feature containers' to submodule
 - drop already deviated ospf:database deviations
 - drop unused imports

Signed-off-by: Joachim Wiberg <[email protected]>
Should be inverted to a --verbose or --debug flag instead.  After this
change we still see the full 'podman create ...' command, with all the
optionas and arguments.

Signed-off-by: Joachim Wiberg <[email protected]>
When an Infix device is connected to a LAN where the gateway has yet to
connect to the Internet, the container script will fail pulling images
from any remote server.

    Nov 16 12:48:13 infix container[3490]: Error: initializing source docker://ghcr.io/kernelkit/curios:edge: pinging container registry ghcr.io: Get "https://ghcr.io/v2/": dial tcp: lookup ghcr.io on 127.0.0.1:53: read udp 127.0.0.1:55422->127.0.0.1:53: i/o timeout
    Nov 16 12:48:13 infix container[3641]: Error: failed creating container fw, please check the configuration.
    Nov 16 12:48:13 infix execd[3490]: /run/containers/queue/S01-fw.sh failed, exit code: 1

Since execd until now only retries on netlink/inotify events, or manual
SIGUSR1, jobs would get stuck even though Internet connectivity had been
established.  This patch fixes that with the addition of a retry timer
which runs while there are pending jobs in the queue.

Signed-off-by: Joachim Wiberg <[email protected]>
Disable the default "podman pull" retry value.  We use execd to retry
"podman create" on failure.  Wihtout this change, a single container
can block start of other containers by 3 * 20 seconds.  Now we only
block max 20 seconds before we try starting the next container.

Modern versions of podman (>= 5.0) have this --retry option, but it
does not have CNI, so this is a temporary workaround.

Signed-off-by: Joachim Wiberg <[email protected]>
Instead of using $HOME, which may be a ramdisk, use /var/tmp which
podman also uses by default.  Also, make sure to clean up after
ourselves.

Signed-off-by: Joachim Wiberg <[email protected]>
This patch allows running the configure script manually to create and
delete containers.  The normal flow via confd has additional handling
to ensure containers are started/stopped on inictl reload.

Signed-off-by: Joachim Wiberg <[email protected]>
 - Refactor logging to simplify code and get proper log level
 - Clean up lingering directories from any extracted tarball on error

Signed-off-by: Joachim Wiberg <[email protected]>
This commit replaces the 'cfg' alias for 'sysrepocfg -f json' with a
small shell script.  Currently only an 'edit' command, similar to the
CLI 'text-editor' command for modifying base64 encoded YANG nodes.

Signed-off-by: Joachim Wiberg <[email protected]>
This patch adds latest symlinks to the curiOS containers to make system
upgrades easier.  I.e., a user can now reference the bundled image with:

    set image oci-archive:/lib/oci/curios-httpd-latest.tar.gz

So that when they upgrade to the latest Infix, which might include an
update of curiOS httpd, they will get a seamless upgrade also of the
container(s) running.

Signed-off-by: Joachim Wiberg <[email protected]>
 - No more default writable layer
 - Don't mention read-only (deprecated, and always on now)
 - Use ghfm note highlights

Signed-off-by: Joachim Wiberg <[email protected]>
Signed-off-by: Joachim Wiberg <[email protected]>
Signed-off-by: Joachim Wiberg <[email protected]>
Copy link
Contributor

@wkz wkz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's GO! 🔥

@wkz wkz merged commit 131d9e9 into main Nov 20, 2024
5 checks passed
@wkz wkz deleted the container-fixes branch November 20, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:main Build default defconfig, not minimal
Projects
Status: Done
4 participants