-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looking for stuck/deadlocked fibers could be easier #2381
Comments
Sounds like a good idea. Would potentially save us a lot of time in multiple investigations. cc @avikivity @denesb WDYT? |
Maybe we could even put some timestamps into the continuation, to weed out the relevant tasks from coredumps more easily. Copying Both things could be hidden behind |
But such users think they don't need it, until the problem is hit :-) Like you said, the problem might be rare, and we'll wish we had this enabled every time the problem is hit. |
That could probably be implemented like so diff --git a/include/seastar/core/future.hh b/include/seastar/core/future.hh
index 9e0b2061..c0fcf63c 100644
--- a/include/seastar/core/future.hh
+++ b/include/seastar/core/future.hh
@@ -43,6 +43,13 @@
#include <seastar/util/std-compat.hh>
#include <seastar/util/modules.hh>
+[[gnu::always_inline]]
+inline uintptr_t getrip() {
+ uintptr_t rip;
+ asm("leaq 0(%%rip), %0":"=r"(rip));
+ return rip;
+}
+
namespace seastar {
struct nested_exception : public std::exception {
@@ -1274,13 +1281,14 @@ class [[nodiscard]] future : private internal::future_base {
future_base::schedule(tws, &tws->_state);
}
template <typename Pr, typename Func, typename Wrapper>
- void schedule(Pr&& pr, Func&& func, Wrapper&& wrapper) noexcept {
+ void schedule(Pr&& pr, Func&& func, Wrapper&& wrapper, uintptr_t rip) noexcept {
// If this new throws a std::bad_alloc there is nothing that
// can be done about it. The corresponding future is not ready
// and we cannot break the chain. Since this function is
// noexcept, it will call std::terminate if new throws.
memory::scoped_critical_alloc_section _;
auto tws = new continuation<Pr, Func, Wrapper, T>(std::move(pr), std::move(func), std::move(wrapper));
+ tws->_static_id = rip;
// In a debug build we schedule ready futures, but not in
// other build modes.
#ifdef SEASTAR_DEBUG
@@ -1422,9 +1430,9 @@ class [[nodiscard]] future : private internal::future_base {
requires std::invocable<Func, T>
|| (std::same_as<void, T> && std::invocable<Func>)
Result
- then(Func&& func) noexcept {
+ then(Func&& func, uintptr_t rip = getrip()) noexcept {
#ifndef SEASTAR_TYPE_ERASE_MORE
- return then_impl(std::move(func));
+ return then_impl(std::move(func), rip);
#else
using func_type = typename internal::future_result<Func, T>::func_type;
noncopyable_function<func_type> ncf;
@@ -1434,7 +1442,7 @@ class [[nodiscard]] future : private internal::future_base {
return futurize_invoke(func, std::forward<decltype(args)>(args)...);
});
}
- return then_impl(std::move(ncf));
+ return then_impl(std::move(ncf), rip);
#endif
}
@@ -1471,7 +1479,7 @@ class [[nodiscard]] future : private internal::future_base {
// Keep this simple so that Named Return Value Optimization is used.
template <typename Func, typename Result>
- Result then_impl_nrvo(Func&& func) noexcept {
+ Result then_impl_nrvo(Func&& func, uintptr_t rip) noexcept {
using futurator = futurize<internal::future_result_t<Func, T>>;
typename futurator::type fut(future_for_get_promise_marker{});
using pr_type = decltype(fut.get_promise());
@@ -1486,13 +1494,13 @@ class [[nodiscard]] future : private internal::future_base {
return internal::future_invoke(func, std::move(state).get_value());
});
}
- });
+ }, rip);
return fut;
}
template <typename Func, typename Result = futurize_t<internal::future_result_t<Func, T>>>
Result
- then_impl(Func&& func) noexcept {
+ then_impl(Func&& func, uintptr_t rip) noexcept {
#ifndef SEASTAR_DEBUG
using futurator = futurize<internal::future_result_t<Func, T>>;
if (failed()) {
@@ -1501,7 +1509,7 @@ class [[nodiscard]] future : private internal::future_base {
return futurator::invoke(std::forward<Func>(func), get_available_state_ref().take_value());
}
#endif
- return then_impl_nrvo<Func, Result>(std::forward<Func>(func));
+ return then_impl_nrvo<Func, Result>(std::forward<Func>(func), rip);
}
public:
diff --git a/include/seastar/core/task.hh b/include/seastar/core/task.hh
index 8dbfb789..eba69cb1 100644
--- a/include/seastar/core/task.hh
+++ b/include/seastar/core/task.hh
@@ -60,6 +60,7 @@ class task {
#else
void make_backtrace() noexcept {}
#endif
+ uintptr_t _static_id = 0;
};
inline |
Having some code pointers in the task could also make printing tasktraces in runtime more practical. Today, they look like this:
If you spend some time parsing those type names by hand, you can probably eventually figure out which lambdas in the source code they refer to. But it's not easy. I avoid doing that if I can. Perhaps there exists some tool that makes parsing those type names easier, but I don't know it. (And for coroutines we actually don't print any information aside from the return type, but that's something that can be fixed on its own). |
No matter how much time you spend trying to decipher those lambda names, you won't be able to make sense of one which is big enough to fill your screen. And we have those. I'm all for adding more metadata to the task objects. I'd love if we could clearly identify a task-chain (continuation-chain or green thread) by some id in memory, as well as walk through all the task objects that are part of it. I think finding deadlocked tasks automatically is not achievable. But a command which would list all green threads on a shard, similar to how Footnotes |
In ScyllaDB, it happens from time to time that some startup/shutdown/maintenance fiber gets stuck. The first step to debugging a problem like this is finding out what fiber got stuck and where, and this usually involves repeatedly adding debug logs and hoping that the bug reproduces again. But that's time-consuming, and not guaranteed to succeed. It would be better to be able to find the stuck fiber in the coredump.
It's already possible to find chains of awaiting
seastar::task
s in the coredump, but AFAIK it's not possible to guess what points in the source code they correspond to. For coroutines, we can find the relevant coroutine, but not point out the relevantco_await
inside it. Forthen
continuations, we can try to guess the relevant function based on the type of the continuation, but it's hard (if possible at all) to do that reliably.So maybe we could, for example, capture the program counter (
rip
) at the call sites ofthen
(and its variants) andco_await
, and put it into the awaiting task (continuation/coroutine)? Combined with debug info, this should allow us to see the source locations of the awaiting continuation points.And it should be cheap enough. (Probably.)
The text was updated successfully, but these errors were encountered: