-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing function runtime patch #1060
base: master
Are you sure you want to change the base?
Tracing function runtime patch #1060
Conversation
ykrt/src/trace/swt/patch.rs
Outdated
let result = mprotect( | ||
func_address, | ||
page_size_aligned, | ||
PROT_READ | PROT_WRITE | PROT_EXEC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the PROT_EXEC
here since you are changing it back to executable again below. In fact, on some systems you are not allowed to mark a region writeable and executable at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
But when I remove PROT_EXEC
I get some of the tests failing in a non-deterministic way 🤔
I guess its because somehow test execution overlaps and this runtime patching takes effect while other tests try to execute these instructions.
I get the same result when I run the tests sequentially as:
YKB_TRACER=swt cargo test -- --test-threads=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, on some systems you are not allowed to mark a region writeable and executable at the same time.
I think Lukas's point is that you can mprotect(PROT_READ | PROT_WRITE)
or mprotect(PROT_READ | PROT_EXEC)
but you can't combine PROT_EXEC
and PROT_WRITE
: you have to write the page and only then mark it as executable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 👉 40c0696
I get that :)
My issue was with failing the test, but I can't reproduce it anymore 😶
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that I can't combine PROT_WRITE
and PROT_EXEC
but..
I again get this issue with failing tests if I set PROT_READ | PROT_WRITE
before callingcopy_nonoverlapping
(patching the function):
Exited due to signal: 11
But when I set it as PROT_READ | PROT_WRITE | PROT_EXEC
it works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normal trick here is: set to PROT_READ|PROT_WRITE
do the writes; then set it to PROT_READ | PROT_EXEC
. That way you never have PROT_WRITE
and PROT_EXEC
set at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and I wish it would work for me but I get segfault when I do it.
Will dig deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going mad 🙃
Its working fine with PROT_READ | PROT_WRITE
on patch and PROT_READ | PROT_EXEC
restore.
updated 👉 6f4bdc3
I still need to perform benchmarking to compare |
Benchmarks completed. Benchmarks - JIT (threshold=5)
Benchmarks - No JIT (threshold=9999999)
SummaryLooks like the runtime patching doesn't give the same performance gain as we've seen with nop benchmarks (x4 boost), there's probably another overhead involved. Command used: YKD_SERIALISE_COMPILATION=1 hyperfine -w 2 -m 50 './src/lua ./tests/closure.lua' |
Let's try this on a few other benchmarks, including those that make longer traces (with short traces we wouldn't necessarily expect to see much difference). |
Disabling trace compilation: diff --git ykrt/src/mt.rs ykrt/src/mt.rs
index 8e9f77c4..c02b6420 100644
--- ykrt/src/mt.rs
+++ ykrt/src/mt.rs
@@ -630,11 +630,11 @@ impl MT {
// spin up a new thread for each compilation. This is only acceptable because a)
// `SERIALISE_COMPILATION` is an internal yk testing feature b) when we use it we're
// checking correctness, not performance.
- thread::spawn(do_compile).join().unwrap();
+ // thread::spawn(do_compile).join().unwrap();
return;
}
- self.queue_job(Box::new(do_compile));
+ // self.queue_job(Box::new(do_compile));
}
Does show the difference as before - roughly x4: Benchmark 1: ./src/lua ./stats/matrix/matrix.lua 50
Time (mean ± σ): 330.8 ms ± 7.3 ms [User: 424.2 ms, System: 36.4 ms]
Range (min … max): 321.6 ms … 337.4 ms 10 runs
Benchmark 1: ./src/lua ./stats/matrix/matrix.lua 50
Time (mean ± σ): 1.444 s ± 0.344 s [User: 1.540 s, System: 0.031 s]
Range (min … max): 1.102 s … 1.784 s 10 runs |
ykrt/src/trace/swt/patch.rs
Outdated
let page_size = sysconf(libc::_SC_PAGESIZE) as usize; | ||
|
||
let func_address = ((function_ptr as usize) & !(page_size - 1)) as *mut c_void; | ||
let page_size_aligned = (((function_ptr as usize) + mem::size_of_val(&function_ptr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use std::alloc::Layout::align
and friends to do this calculation for us.
I'm also not sure what the difference between function_ptr
and func_address
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure what the difference between function_ptr and func_address is.
My understanding/intention here is that function_ptr
is just the pointer to the function while func_address
is the page address (maybe it should be renamed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated 👉 8e00241
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might get lost but there are some comments on 8e00241.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I did missed that! sorry
ykrt/src/trace/swt/patch.rs
Outdated
// This unwrap should be safe since we are using a page that is | ||
// based on function_ptr with a known location. | ||
let layout = Layout::from_size_align(start_offset, page_size) | ||
.expect("Failed to create layout for function memory page"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just use unwrap
as all these expect
s really bloat the code, especially as they shouldn't ever trigger. [As this suggests we almost never use expect
: it does have a place, but it's really rare for us.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
updated 👉 0428dc1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of the other expect
s introduced in this PR too please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any other expect
s only a single unwrap
.
Am I missing it somehow?
ykrt/src/trace/swt/patch.rs
Outdated
} | ||
// Set function memory page as writable. | ||
// Ignoring mprotect call failure. | ||
mprotect(page_address, layout.size(), PROT_READ | PROT_WRITE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to ignore the return code: we just want to abort changing the machine code. So we need something like if mprotect(...) { return ... }
and then recover from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to check: is just return
enough for the system to keep running? If so, let's add a comment to that effect.
Alternatively, if we think "if mprotect
has failed, we've got something wrong", then we might want to panic
if mprotect
fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to return if mrotect
as it should be transactional and not leave the memory in corrupted state on error. Unless we consider the "runtime patching" as a must-have for SWT, then we should definitely panic
.
I tested it with different invalid memory addresses that resulted in mrotect
returning a non-0 result and the test suite passed.
We still risk changing memory that will not necessarily cause mrotect
to return with an error but will cause some other part of the process to have invalid memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this runtime patching is an integral part of SWT.
Added panic 👉 6f0abb3
Hmm, I've just realised a possible blocker with this: multi-threading. That's going to require some very careful handling. |
Yes, that would be a problem with instruction patching but I thought we only support serialised compilation in YK? |
yk is definitely multi-threaded but we tend to temporarily turn that off because of problems with llvm. When the new JIT codegen is ready, we'll be back to fully multi-threaded. Let's have a chat about this PR in a day or two. It's still very useful! |
ykcapi
toykrt
.ret
instruction to the tracing function when jit transition between tracing and stop-tracing (including side traces).