-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iHP DRC causes segfaults randomly #1907
Comments
Hmm ... thanks for this report. However, I can't see the crashes on Ubuntu 24 with the official package from the download page. I also don't see anything suspicious in valgrind. From the trace, the issue can be anything. A rough guess is a problem with the Ruby integration. Is it safe to assume the build is correctly linked against the same Ruby library that is used at runtime? I am not familiar with nix. I am plain old distro user. I have not tried builds with nix and honestly that is not on my top priority list. Is anyone else able to reproduce the issue? Matthias |
TBH I'm not all that familiar with nix either :/ But it's what's used for all OpenLane2 runs so it's everywhere ... The issue showed up both in the github actions and on my test VM. They both use the same "nix derivation" (sort of "source package") for klayout but they were built independently on different hardware and showed the same issue. I also can't reproduce it on my laptop (using the exact same git hash) that runs klayout natively. |
at the first attempt, it coredumped several times out of 10 times on release build of klayout M1 MacOS Sequoia using homebrew
|
when it occurs, the coredump is triggered by "Rule pSD.d" |
Yes, here when it doesn't crash but reports wrong DRC errors, they are also always in the |
I will try to reproduce the issue with nix. Do you have some basic build instructions for me? Thanks, Matthias |
Hello folks, The table below shows another test result set on an Intel Mac using different DMGs (#1871).
Kazzz-S |
So the easiest is to follow the instruction to install OpenLane 2 : except instead of using the official repo, use the Then when nix is installed and you're in the OL2 directory just typing |
@Kazzz-S 10 runs might not be enough, I had 2 fails in the first 10, then none in the next 30 before a couple in the next 10. |
Also note that ATM in my latest build I can't get it to segfault anymore ... but it reports random non-existing DRC errors half the time. So this is really weird. I also tried removing the pSD.d rule and then it works fine. |
As another data point, we're also observing erroneous DRC errors of the pSD.d rule in a github action that doesn't use nix and just uses the ubuntu 24.04 package from the website : https://github.com/TinyTapeout/tt-gds-action/blob/main/orfs/action.yml#L72 No crashes, just wrong DRC results. |
Valgrind log that shows access to a free'd block when doing |
Thanks for the valgrind log. Most likely it is a modify-while-iterating issue. But it is hard to pinpoint as the code is somewhat complex. Modification is not direct and the problem is a recomputation of the instance quad tree.. Basically the iterators involved should lock the layout to prevent this effect. |
I haven't confirmed yet, but I'm starting to wonder if this is not a clang vs gcc related. |
Thanks to kazzz build scripts, running on M1 Mac with asan is as easy as adding -d to the build command. Apple clang version 16.0.0 (clang-1600.0.26.3) |
@stefanottili the log is consistent with the valgrind log. I am using gcc as of now. I will try a build with clang and see if I can reproduce the issue. Matthias |
Nevermind, I built using clang natively and couldn't reproduce the issue. |
here's the asan output
|
wsl ubuntu: Linux version 5.15.153.1-microsoft-standard-WSL2 Neither gcc nor clang shows any valgrind issues with rule pSD.d gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) |
@stefanottili, you are right! Kazzz-S
|
@Kazzz-S You should also check the number of DRC errors reported. |
Good news :) I built with the clang++ 18.1.3 that comes with my Ubuntu 24 instead of gcc, and although I don't see a crash or wrong DRC output, I can see the issue in the valgrind logs! It could also be an issue of the STL used (clang is using their own STL implementation as far as I understand). I am able to debug the issue now and will let you know my findings. Thanks, Matthias |
@klayoutmatthias Awesome ! Thanks a lot for looking into this. |
Of course I do! :) Just allow me a little while ... |
@smunaut, I modified the batch script
Then I again ran the DRC 500 (= 5 DMGs x 100) times, but this time, I only had one crash at the beginning of program execution. No DRC errors were observed. See @klayoutmatthias, thanks for your effort 😄 [Added on 2024-10-25]
|
Here is some update: the issue seems to be caused by some interaction of Ruby's Garbage Collector and the explicit error filtering loop used in the DRC deck. I assume that some temporary layers are cleaned up during the iteration of the error shapes. Although that is technically unrelated, there are interactions as both the error shapes and the temporary layer shapes are stored in the same hierarchical structure. That is not related to gcc vs. clang, but the compiler may make subtle differences. A first patch is to disable the garbage collector during these loops. I changed this function
(lines 135++ in the .lydrc file) to
so that during the runtime of the function the GC is disabled. With this patch, I do not see issues in valgrind any longer. I better solution was to disable the layout updates that cause the interference during the iteration. This is a C++ patch then and should prevent similar issues in the future. Matthias |
I just tested the lydrc work around and it worked fine. I got through a couple hundreds of runs without spurious DRC errors or crashes :) Thanks ! I'll use that for this tapeout while waiting for a release with the C++ patch. Sylvain |
The bugfix/issue-1907 branch fixes the asan issues on Mac M1. |
Hello @klayoutmatthias, Just for your information.
Then, I observed
Kazzz-S |
@Kazzz-S and @stefanottili Many thanks for this feedback! The patch is not perfect yet - I am observing some thread collision issues in one unit test I still need to solve. But I am confident that is not a big issue. Best regards, Matthias |
When running DRC using the iHP deck, I'm getting random segfault. This is using 0.29.7 build from source in a nix environment.
I'm attached a test case. However this is random. One run might work 100% fine. The next could crash. And then sometime it reports non-existing DRC error.
The package includes both:
drc_tst.zip
The text was updated successfully, but these errors were encountered: