Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boot/00-check-rtc-and-wait-ntp.sh to cidata #2894

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

norio-nomura
Copy link
Contributor

In vz, the VM lacks an RTC when booting with a kernel image (see: https://developer.apple.com/forums/thread/760344). This causes incorrect system time until NTP synchronizes it, leading to TLS errors. To avoid TLS errors, this script waits for NTP synchronization if RTC is unavailable.

This script does the following:

  • Exits with 0 if /dev/rtc0 exists.
  • Exits with 0 if systemctl is not available.
  • Enables systemd-time-wait-sync.service to wait for NTP synchronization at an earlier stage on subsequent boots.
  • Waits for NTP synchronization within the script for the first boot.

Log output during execution:

LIMA 2024-08-08T23:51:15+09:00| Executing /mnt/lima-cidata/boot/00-check-rtc-and-wait-ntp.sh
Created symlink /etc/systemd/system/sysinit.target.wants/systemd-time-wait-sync.service → /usr/lib/systemd/system/systemd-time-wait-sync.service.
TimeUSec=Thu 2024-08-08 23:51:15 JST, Waiting for NTP synchronization...
TimeUSec=Thu 2024-08-08 23:51:16 JST, Waiting for NTP synchronization...
...
TimeUSec=Thu 2024-08-08 23:51:41 JST, Waiting for NTP synchronization...
TimeUSec=Thu 2024-08-08 23:51:42 JST, Waiting for NTP synchronization...
TimeUSec=Tue 2024-11-12 11:43:37 JST, NTP synchronization complete.
NTPMessage={ Leap=0, Version=4, Mode=4, Stratum=2, Precision=-25, RootDelay=991us, RootDispersion=259us, Reference=11FD1CFB, OriginateTimestamp=Thu 2024-08-08 23:51:43 JST, ReceiveTimestamp=Tue 2024-11-12 11:43:36 JST, TransmitTimestamp=Tue 2024-11-12 11:43:36 JST, DestinationTimestamp=Thu 2024-08-08 23:51:43 JST, Ignored=no, PacketCount=1, Jitter=0 }

@norio-nomura
Copy link
Contributor Author

Log output during execution:

It seems there is no pattern to the time spent waiting for NTP synchronization. Sometimes it waits for about 30 seconds, while other times it doesn't wait at all because it is already synchronized.

@norio-nomura
Copy link
Contributor Author

opened issue #2905 regarding the problem when using a kernel image with vz.

@nirs
Copy link
Member

nirs commented Nov 14, 2024

In vz, the VM lacks an RTC when booting with a kernel image (see: https://developer.apple.com/forums/thread/760344). This causes incorrect system time until NTP synchronizes it, leading to TLS errors.

Why do we get TLS errors? can we avoid them without waiting for NTP update?

@norio-nomura
Copy link
Contributor Author

Why do we get TLS errors? can we avoid them without waiting for NTP update?

The curl error message posted in #2905 is as follows:

[    6.489211] cloud-init[770]: + curl -fsSL https://get.docker.com
[    6.558633] cloud-init[770]: curl: (60) SSL certificate problem: certificate is not yet valid
[    6.558703] cloud-init[770]: More details here: https://curl.se/docs/sslcerts.html
[    6.558761] cloud-init[770]: curl failed to verify the legitimacy of the server and therefore could not
[    6.558834] cloud-init[770]: establish a secure connection to it. To learn more about this situation and
[    6.558926] cloud-init[770]: how to fix it, please visit the web page mentioned above.
[    6.559955] cloud-init[770]: LIMA 2024-08-08T23:51:16+09:00| WARNING: Failed to execute /mnt/lima-cidata/provision.system/00000002

The system time at that point was approximately 2024-08-08T23:51:16+09:00. Checking the certificate for https://get.docker.com shows:

$ curl -w '%{certs}' https://get.docker.com -o /dev/null | grep date
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22115  100 22115    0     0  74788      0 --:--:-- --:--:-- --:--:-- 74966
Start date:Sep  2 00:00:00 2024 GMT
Expire date:Oct  1 23:59:59 2025 GMT
Start date:Aug 23 22:25:30 2022 GMT
Expire date:Aug 23 22:25:30 2030 GMT
Start date:May 25 12:00:00 2015 GMT
Expire date:Dec 31 01:00:00 2037 GMT

The certificate appears to be valid from Sep 2 00:00:00 2024 GMT, so the system judged it as "not yet valid," causing the connection to fail.

#2905 also includes logs from apt-get, though the details there are unclear.

@nirs
Copy link
Member

nirs commented Nov 16, 2024

curl -fsSL https://get.docker.com

Why not use -k to skip verification of the certificate? We don't really need to verify the server which we fully control.

Start date:Sep  2 00:00:00 2024 GMT
Expire date:Oct  1 23:59:59 2025 GMT
Start date:Aug 23 22:25:30 2022 GMT
Expire date:Aug 23 22:25:30 2030 GMT
Start date:May 25 12:00:00 2015 GMT
Expire date:Dec 31 01:00:00 2037 GMT

Who create the certificate? can we create it with start date in the distant past, and expire date in the distant future, to ensure that it is always considered valid, even when the guest clock is broken?

@nirs
Copy link
Member

nirs commented Nov 16, 2024

Ok, I see we fail to validate the remote certificate when accessing https://get.docker.com which we have no control of so we cannot avoid the verification.

@nirs
Copy link
Member

nirs commented Nov 16, 2024

Log output during execution:

It seems there is no pattern to the time spent waiting for NTP synchronization. Sometimes it waits for about 30 seconds, while other times it doesn't wait at all because it is already synchronized.

Adding unpredictable 30 seconds wait when starting is not great, but if this happens only when starting with kernel image I think we are fine.

@AkihiroSuda
Copy link
Member

Sometimes it waits for about 30 seconds

Not specific to NTP, but we have to have a method to show the progress information in limactl start CLI

@norio-nomura
Copy link
Contributor Author

but if this happens only when starting with kernel image I think we are fine.

As far as I know, /dev/rtc0 is only unavailable when using a kernel image with vz.

In vz, the VM lacks an RTC when booting with a kernel image (see: https://developer.apple.com/forums/thread/760344).
This causes incorrect system time until NTP synchronizes it, leading to TLS errors.
To avoid TLS errors, this script waits for NTP synchronization if RTC is unavailable.

This script does the following:
- Exits with 0 if `/dev/rtc0` exists.
- Exits with 0 if `systemctl` is not available.
- Enables `systemd-time-wait-sync.service` to wait for NTP synchronization at an earlier stage on subsequent boots.
- Waits for NTP synchronization within the script for the first boot.

Log output during execution:
```console
LIMA 2024-08-08T23:51:15+09:00| Executing /mnt/lima-cidata/boot/00-check-rtc-and-wait-ntp.sh
Created symlink /etc/systemd/system/sysinit.target.wants/systemd-time-wait-sync.service → /usr/lib/systemd/system/systemd-time-wait-sync.service.
TimeUSec=Thu 2024-08-08 23:51:15 JST, Waiting for NTP synchronization...
TimeUSec=Thu 2024-08-08 23:51:16 JST, Waiting for NTP synchronization...
...
TimeUSec=Thu 2024-08-08 23:51:41 JST, Waiting for NTP synchronization...
TimeUSec=Thu 2024-08-08 23:51:42 JST, Waiting for NTP synchronization...
TimeUSec=Tue 2024-11-12 11:43:37 JST, NTP synchronization complete.
NTPMessage={ Leap=0, Version=4, Mode=4, Stratum=2, Precision=-25, RootDelay=991us, RootDispersion=259us, Reference=11FD1CFB, OriginateTimestamp=Thu 2024-08-08 23:51:43 JST, ReceiveTimestamp=Tue 2024-11-12 11:43:36 JST, TransmitTimestamp=Tue 2024-11-12 11:43:36 JST, DestinationTimestamp=Thu 2024-08-08 23:51:43 JST, Ignored=no, PacketCount=1, Jitter=0 }
```

Signed-off-by: Norio Nomura <[email protected]>
…time of the script

The larger the difference between this system time and the NTP server time, the longer the NTP synchronization will take.
By setting the system time to the modification time of this script, which is likely to be closer to the actual time,
the NTP synchronization time can be shortened.

Signed-off-by: Norio Nomura <[email protected]>
…service

Because it is slower than setting the system time to the modification time of this script.

Signed-off-by: Norio Nomura <[email protected]>
@norio-nomura
Copy link
Contributor Author

By adding a process to set the script's modification time as the system time on the first iteration of the loop waiting for NTP synchronization, the loop now completes in about 1 second each time. It no longer takes up to 30 seconds.

…cation time from the script itself to `user-data`

Signed-off-by: Norio Nomura <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants