Fix memory leak for static_partitioner #1404

pavelkumbrasev · 2024-06-13T09:46:53Z

Description

global_control we limit concurrency that all the internal arenas can share between each other. Therefore, each particular arena doesn't not know its actual concurrency limit but only one you explicitly set during construction or for default arena (one that will be used for simple parallel_for call for example) the concurrency will be a whole machine.
When you start parallel_for with static_partitioner it will create as many internal proxy tasks as normal tasks (proxy tasks are used to assign tasks to specific threads). Proxy tasks have a property they should be executed twice. When execute is called for the proxy task for the first time it will return actual task that it propagated. When execute is called for the second time proxy task can be deleted.
In test case each time we call parallel_for with static_partitioner it will create hardware_concurrency proxy tasks but because global_control is present the concurrency of the default arena will not be fully satisfied and some of the proxy tasks will be called only once so they never be destroyed.

This fix will not help if global_control is set concurrently with parallel algorithm execution. Perhaps, this scenario is less probable.

Fixes # - issue number(s) if exists
#1403

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

bug fix - change that fixes an issue
new feature - change that adds functionality
tests - change in tests
infrastructure - change in infrastructure and CI
documentation - documentation update

Tests

added - required for new features and some bug fixes
not needed

Documentation

updated in # - add PR number
needs to be updated
not needed

Breaks backward compatibility

Yes
No
Unknown

Notify the following users

@lennoxho

Other information

Signed-off-by: pavelkumbrasev <[email protected]>

include/oneapi/tbb/partitioner.h

pavelkumbrasev · 2024-06-13T12:46:13Z

Potentially, idle workers can drain tasks from free slots.

isaevil · 2024-06-13T15:09:03Z

include/oneapi/tbb/partitioner.h

+        unsigned max_threads_in_arena = unsigned(std::min(static_cast<std::size_t>(max_concurrency()),
+            tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism)));


Looks a little bit clumsy, don't you think? Does it make sense to move calculation of maximum_arena_concurrency into some dedicated function that could be reused in get_initial_auto_partitioner_divisor as well?

I'm not sure. if need to introduce new method for these 2 lines. (we check not arena concurrency min(arena_concurrency, allowed_concurrency) but rather available amount of workers).

I agree to move it into something like get_num_possible_workers() to highlight that the solution is based on the immediate value of workers as std::min(...) does not mean much to reader.

Remove from affinity

pavelkumbrasev · 2024-06-14T13:23:04Z

Potentially, idle workers can drain tasks from free slots.

This will require additional synchronization on mailbox due to concurrent access to it. So while this solution will solve all potential problems it also brings potential performance problems

aleksei-fedotov

Questions:

How does the patch fix memory leaks in case global_control is instantiated in the middle of a parallel algorithm's work?
Write such a test?

aleksei-fedotov · 2024-06-26T12:22:49Z

test/tbb/test_task.cpp

+    tbb::global_control gbl_ctrl{ tbb::global_control::max_allowed_parallelism, std::size_t(tbb::this_task_arena::max_concurrency() / 2) };
+
+    size_t current_memory_usage = 0, previous_memory_usage = 0, stability_counter = 0;
+    bool no_memory_leak = false;
+    std::size_t num_iterations = 100;
+    for (std::size_t i = 0; i < num_iterations; ++i) {
+        for (std::size_t j = 0; j < 100; ++j) {
+            tbb::parallel_for(0, 1000, [] (int) {}, tbb::static_partitioner{});
+        }
+
+        current_memory_usage = utils::GetMemoryUsage();
+        stability_counter = current_memory_usage==previous_memory_usage ? stability_counter + 1 : 0;
+        // If the amount of used memory has not changed during 5% of executions,
+        // then we can assume that the check was successful
+        if (stability_counter > num_iterations / 20) {
+            no_memory_leak = true;
+            break;
+        }
+        previous_memory_usage = current_memory_usage;
+    }
+    REQUIRE_MESSAGE(no_memory_leak, "Seems we get memory leak here.");


Wrap it into the loop with gradual decrease of the global_control's limit? E.g.,

std::size_t current_limit = std::size_t(tbb::this_task_arena::max_concurrency()); while (current_limit /= 2) { tbb::global_control gc{ tbb::global_control::max_allowed_parallelism, current_limit }; // iterations loop goes here { // repetitions loop goes here { // } // } }

aleksei-fedotov · 2024-06-26T12:27:39Z

include/oneapi/tbb/partitioner.h

+        unsigned max_threads_in_arena = unsigned(std::min(static_cast<std::size_t>(max_concurrency()),
+            tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism)));


I agree to move it into something like get_num_possible_workers() to highlight that the solution is based on the immediate value of workers as std::min(...) does not mean much to reader.

pavelkumbrasev added 2 commits June 13, 2024 10:40

Fix memory leak for static_partitioner

9d8348f

Signed-off-by: pavelkumbrasev <[email protected]>

Update copyright

975f9bf

Signed-off-by: pavelkumbrasev <[email protected]>

pavelkumbrasev requested review from aleksei-fedotov, dnmokhov, kboyarinov and isaevil June 13, 2024 09:46

pavelkumbrasev linked an issue Jun 13, 2024 that may be closed by this pull request

static_partitioner + global_control triggers an unbounded memory leak #1403

Open

github-actions bot added the bug fix label Jun 13, 2024

pavelkumbrasev commented Jun 13, 2024

View reviewed changes

include/oneapi/tbb/partitioner.h Outdated Show resolved Hide resolved

Align expression types

8ba0d13

isaevil reviewed Jun 13, 2024

View reviewed changes

aleksei-fedotov reviewed Jun 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leak for static_partitioner #1404

Fix memory leak for static_partitioner #1404

pavelkumbrasev commented Jun 13, 2024

pavelkumbrasev commented Jun 13, 2024

isaevil Jun 13, 2024

pavelkumbrasev Jun 14, 2024

aleksei-fedotov Jun 26, 2024

pavelkumbrasev Jun 26, 2024

pavelkumbrasev commented Jun 14, 2024

aleksei-fedotov left a comment •

edited

Loading

aleksei-fedotov Jun 26, 2024

aleksei-fedotov Jun 26, 2024

		unsigned max_threads_in_arena = unsigned(std::min(static_cast<std::size_t>(max_concurrency()),
		tbb::global_control::active_value(tbb::global_control::max_allowed_parallelism)));

Fix memory leak for static_partitioner #1404

Are you sure you want to change the base?

Fix memory leak for static_partitioner #1404

Conversation

pavelkumbrasev commented Jun 13, 2024

Description

Type of change

Tests

Documentation

Breaks backward compatibility

Notify the following users

Other information

pavelkumbrasev commented Jun 13, 2024

isaevil Jun 13, 2024

Choose a reason for hiding this comment

pavelkumbrasev Jun 14, 2024

Choose a reason for hiding this comment

aleksei-fedotov Jun 26, 2024

Choose a reason for hiding this comment

pavelkumbrasev Jun 26, 2024

Choose a reason for hiding this comment

pavelkumbrasev commented Jun 14, 2024

aleksei-fedotov left a comment • edited Loading

Choose a reason for hiding this comment

aleksei-fedotov Jun 26, 2024

Choose a reason for hiding this comment

aleksei-fedotov Jun 26, 2024

Choose a reason for hiding this comment

aleksei-fedotov left a comment •

edited

Loading