Tracing function runtime patch #1060

Pavel-Durov · 2024-03-27T16:27:41Z

Move tracing function from ykcapi to ykrt.
Add runtime patching with a single ret instruction to the tracing function when jit transition between tracing and stop-tracing (including side traces).

ykrt/src/trace/swt/patch.rs

ptersilie · 2024-03-28T11:42:34Z

ykrt/src/trace/swt/patch.rs

+    let result = mprotect(
+        func_address,
+        page_size_aligned,
+        PROT_READ | PROT_WRITE | PROT_EXEC,


You don't need the PROT_EXEC here since you are changing it back to executable again below. In fact, on some systems you are not allowed to mark a region writeable and executable at the same time.

Sounds good.
But when I remove PROT_EXEC I get some of the tests failing in a non-deterministic way 🤔
I guess its because somehow test execution overlaps and this runtime patching takes effect while other tests try to execute these instructions.
I get the same result when I run the tests sequentially as:

YKB_TRACER=swt cargo test -- --test-threads=1

In fact, on some systems you are not allowed to mark a region writeable and executable at the same time.

I think Lukas's point is that you can mprotect(PROT_READ | PROT_WRITE) or mprotect(PROT_READ | PROT_EXEC) but you can't combine PROT_EXEC and PROT_WRITE: you have to write the page and only then mark it as executable.

Updated 👉 40c0696
I get that :)
My issue was with failing the test, but I can't reproduce it anymore 😶

I understand that I can't combine PROT_WRITE and PROT_EXEC but..

I again get this issue with failing tests if I set PROT_READ | PROT_WRITE before callingcopy_nonoverlapping (patching the function):

Exited due to signal: 11

But when I set it as PROT_READ | PROT_WRITE | PROT_EXEC it works fine.

The normal trick here is: set to PROT_READ|PROT_WRITE do the writes; then set it to PROT_READ | PROT_EXEC. That way you never have PROT_WRITE and PROT_EXEC set at once.

I understand and I wish it would work for me but I get segfault when I do it.
Will dig deeper.

I think I'm going mad 🙃
Its working fine with PROT_READ | PROT_WRITE on patch and PROT_READ | PROT_EXEC restore.
updated 👉 6f4bdc3

ykrt/src/trace/swt/patch.rs

This reverts commit 3725c71.

Pavel-Durov · 2024-03-29T12:58:08Z

I still need to perform benchmarking to compare hwt / swt / swt + noop / swt + runtime patch runtimes.
I would hold off on the PR review until it's done as it might introduce more changes.

Pavel-Durov · 2024-03-30T10:38:00Z

Benchmarks completed.

Benchmarks - JIT (threshold=5)

YK	Range (min … max)	Time (mean ± σ)	Runs
swt + runtime-patch	3.140 s … 5.266 s	3.895 s ± 0.654 s	50
swt	3.221 s … 5.163 s	4.035 s ± 0.638 s	50
swt + nop	288.6 ms … 520.1 ms	420.4 ms ± 92.6 ms	50
hwt	3.444 s … 5.503 s	4.453 s ± 0.775 s	50

Benchmarks - No JIT (threshold=9999999)

YK	Range (min … max)	Time (mean ± σ)	Runs
swt + runtime-patch	248.4 ms … 257.6 ms	252.8 ms ± 2.1 ms	50
swt + nop	44.3 ms … 57.0 ms	54.3 ms ± 3.0 ms	50
swt	244.9 ms … 268.4 ms	259.2 ms ± 4.5 ms	50
hwt	35.8 ms … 44.5 ms	39.4 ms ± 1.0 ms	50

Summary

Looks like the runtime patching doesn't give the same performance gain as we've seen with nop benchmarks (x4 boost), there's probably another overhead involved.
Or maybe we need to find the right threshold value for yk_mt_hot_threshold_set to see the impact?

Command used:

YKD_SERIALISE_COMPILATION=1 hyperfine -w 2 -m 50 './src/lua ./tests/closure.lua'

ltratt · 2024-03-30T11:22:02Z

Let's try this on a few other benchmarks, including those that make longer traces (with short traces we wouldn't necessarily expect to see much difference).

Pavel-Durov · 2024-04-09T15:26:19Z

Disabling trace compilation:

diff --git ykrt/src/mt.rs ykrt/src/mt.rs
index 8e9f77c4..c02b6420 100644
--- ykrt/src/mt.rs
+++ ykrt/src/mt.rs
@@ -630,11 +630,11 @@ impl MT {
             // spin up a new thread for each compilation. This is only acceptable because a)
             // `SERIALISE_COMPILATION` is an internal yk testing feature b) when we use it we're
             // checking correctness, not performance.
-            thread::spawn(do_compile).join().unwrap();
+            // thread::spawn(do_compile).join().unwrap();
             return;
         }
 
-        self.queue_job(Box::new(do_compile));
+        // self.queue_job(Box::new(do_compile));
     }

Does show the difference as before - roughly x4:
yk+swt master

Benchmark 1: ./src/lua ./stats/matrix/matrix.lua 50
  Time (mean ± σ):     330.8 ms ±   7.3 ms    [User: 424.2 ms, System: 36.4 ms]
  Range (min … max):   321.6 ms … 337.4 ms    10 runs

yk-swt/tracing-function-runtime-patch:

Benchmark 1: ./src/lua ./stats/matrix/matrix.lua 50
  Time (mean ± σ):      1.444 s ±  0.344 s    [User: 1.540 s, System: 0.031 s]
  Range (min … max):    1.102 s …  1.784 s    10 runs

ykrt/src/trace/swt/patch.rs

ykrt/src/trace/swt/mod.rs

ltratt · 2024-04-13T12:11:57Z

ykrt/src/trace/swt/patch.rs

+    let page_size = sysconf(libc::_SC_PAGESIZE) as usize;
+
+    let func_address = ((function_ptr as usize) & !(page_size - 1)) as *mut c_void;
+    let page_size_aligned = (((function_ptr as usize) + mem::size_of_val(&function_ptr))


We can use std::alloc::Layout::align and friends to do this calculation for us.

I'm also not sure what the difference between function_ptr and func_address is.

I'm also not sure what the difference between function_ptr and func_address is.

My understanding/intention here is that function_ptr is just the pointer to the function while func_address is the page address (maybe it should be renamed)

updated 👉 8e00241

This might get lost but there are some comments on 8e00241.

oh I did missed that! sorry

ltratt · 2024-04-19T08:06:37Z

ykrt/src/trace/swt/patch.rs

+    // This unwrap should be safe since we are using a page that is
+    // based on function_ptr with a known location.
+    let layout = Layout::from_size_align(start_offset, page_size)
+        .expect("Failed to create layout for function memory page");


Let's just use unwrap as all these expects really bloat the code, especially as they shouldn't ever trigger. [As this suggests we almost never use expect: it does have a place, but it's really rare for us.]

Got it.
updated 👉 0428dc1

Can we get rid of the other expects introduced in this PR too please?

I don't see any other expects only a single unwrap.
Am I missing it somehow?

ltratt · 2024-04-19T08:07:28Z

ykrt/src/trace/swt/patch.rs

-    }
+    // Set function memory page as writable.
+    // Ignoring mprotect call failure.
+    mprotect(page_address, layout.size(), PROT_READ | PROT_WRITE);


We don't want to ignore the return code: we just want to abort changing the machine code. So we need something like if mprotect(...) { return ... } and then recover from there.

Added 👉 6be7bef8f5968aec6e44f792422d446f306c4c7a

Just to check: is just return enough for the system to keep running? If so, let's add a comment to that effect.

Alternatively, if we think "if mprotect has failed, we've got something wrong", then we might want to panic if mprotect fails.

I think it's ok to return if mrotect as it should be transactional and not leave the memory in corrupted state on error. Unless we consider the "runtime patching" as a must-have for SWT, then we should definitely panic.

I tested it with different invalid memory addresses that resulted in mrotect returning a non-0 result and the test suite passed.
We still risk changing memory that will not necessarily cause mrotect to return with an error but will cause some other part of the process to have invalid memory.

Actually, I think this runtime patching is an integral part of SWT.
Added panic 👉 6f0abb3

ltratt · 2024-04-25T09:56:18Z

Hmm, I've just realised a possible blocker with this: multi-threading. That's going to require some very careful handling.

Pavel-Durov · 2024-04-25T17:46:36Z

Hmm, I've just realised a possible blocker with this: multi-threading. That's going to require some very careful handling.

Yes, that would be a problem with instruction patching but I thought we only support serialised compilation in YK?

ltratt · 2024-04-25T18:33:41Z

yk is definitely multi-threaded but we tend to temporarily turn that off because of problems with llvm. When the new JIT codegen is ready, we'll be back to fully multi-threaded.

Let's have a chat about this PR in a day or two. It's still very useful!

Pavel-Durov added 6 commits March 27, 2024 15:53

Add runtime patch and restore calls on trace transition.

8d706a5

Add docstring in patch module

e1fd774

Add message to panic on empty trace in swt.

ee56310

Add patch calls to side-traces.

461448f

Add panic if runtime patch is called on non-x86 platform.

e237278

Add test to swt patch.

0699dbf

Pavel-Durov assigned ltratt and ptersilie Mar 27, 2024

Pavel-Durov added 2 commits March 27, 2024 16:30

Add restore assertion to patch test.

0db3e4f

Add assert to function restore.

6994b59

ltratt reviewed Mar 27, 2024

View reviewed changes

ykrt/src/trace/swt/patch.rs Outdated Show resolved Hide resolved

ltratt reviewed Mar 27, 2024

View reviewed changes

ykrt/src/trace/swt/patch.rs Outdated Show resolved Hide resolved

ptersilie reviewed Mar 28, 2024

View reviewed changes

ykrt/src/trace/swt/patch.rs Outdated Show resolved Hide resolved

Pavel-Durov added 5 commits March 29, 2024 12:37

Add x86 compile-time guard

0e88a29

Add compile-time guard.

86e404b

Remove PROT_EXEC on mprotect.

3725c71

Revert "Remove PROT_EXEC on mprotect."

799644e

This reverts commit 3725c71.

Add documentation.

d572cab

Pavel-Durov added 3 commits March 29, 2024 13:08

Add module docstring.

0bd8f36

Merge branch 'master' into tracing-function-runtime-patch

08029de

Encapsulate x86 instructions.

b2aee3d

Pavel-Durov added 2 commits April 7, 2024 12:02

Merge branch 'master' into tracing-function-runtime-patch

f1794a2

Merge branch 'master' into tracing-function-runtime-patch

c46bb5f

Pavel-Durov added 2 commits April 9, 2024 16:32

Remove PROT_EXEC from mprotect.

40c0696

Format.

4401153

ltratt reviewed Apr 9, 2024

View reviewed changes

ykrt/src/trace/swt/patch.rs Outdated Show resolved Hide resolved

ltratt reviewed Apr 9, 2024

View reviewed changes

ykrt/src/trace/swt/mod.rs Show resolved Hide resolved

Pavel-Durov added 4 commits April 11, 2024 18:55

Remove not needed cfgs.

5ead893

Add x86_64 guard.

b6fcbd4

Merge branch 'master' into tracing-function-runtime-patch

b334bfa

Set mprotect as read+write on patch.

6f4bdc3

ltratt reviewed Apr 13, 2024

View reviewed changes

Pavel-Durov added 5 commits April 14, 2024 16:13

Use Layout::from_size_align for aligned page calculation.

8e00241

Change match to unwrap().

ada68bb

Recover and panic on mprotect.

3e7d023

Remove +size from start_offset calculation.

18b7fee

Add comments.

e1be48c

ltratt reviewed Apr 19, 2024

View reviewed changes

Pavel-Durov added 3 commits April 20, 2024 13:23

Use unwrap over expect.

0428dc1

Early return if mprotect failes to set memmory page as writable.

6be7bef

Add panic on mprotect failure.

6f0abb3

Merge branch 'master' into tracing-function-runtime-patch

94597ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracing function runtime patch #1060

Tracing function runtime patch #1060

Pavel-Durov commented Mar 27, 2024

ptersilie Mar 28, 2024

Pavel-Durov Mar 29, 2024

ltratt Mar 30, 2024

Pavel-Durov Apr 9, 2024

Pavel-Durov Apr 11, 2024

ltratt Apr 11, 2024

Pavel-Durov Apr 12, 2024

Pavel-Durov Apr 13, 2024

Pavel-Durov commented Mar 29, 2024

Pavel-Durov commented Mar 30, 2024 •

edited

Loading

ltratt commented Mar 30, 2024

Pavel-Durov commented Apr 9, 2024

ltratt Apr 13, 2024

Pavel-Durov Apr 14, 2024

Pavel-Durov Apr 14, 2024

ltratt Apr 17, 2024

Pavel-Durov Apr 18, 2024

ltratt Apr 19, 2024

Pavel-Durov Apr 20, 2024

ltratt Apr 22, 2024

Pavel-Durov Apr 23, 2024

ltratt Apr 19, 2024

Pavel-Durov Apr 20, 2024

ltratt Apr 22, 2024

Pavel-Durov Apr 23, 2024 •

edited

Loading

Pavel-Durov Apr 23, 2024

ltratt commented Apr 25, 2024

Pavel-Durov commented Apr 25, 2024

ltratt commented Apr 25, 2024

Tracing function runtime patch #1060

Are you sure you want to change the base?

Tracing function runtime patch #1060

Conversation

Pavel-Durov commented Mar 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pavel-Durov commented Mar 29, 2024

Pavel-Durov commented Mar 30, 2024 • edited Loading

Benchmarks - JIT (threshold=5)

Benchmarks - No JIT (threshold=9999999)

Summary

ltratt commented Mar 30, 2024

Pavel-Durov commented Apr 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pavel-Durov Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ltratt commented Apr 25, 2024

Pavel-Durov commented Apr 25, 2024

ltratt commented Apr 25, 2024

Pavel-Durov commented Mar 30, 2024 •

edited

Loading

Pavel-Durov Apr 23, 2024 •

edited

Loading