Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11410. Refactoring more tests from TestContainerBalancerTask #7156

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Montura
Copy link
Contributor

@Montura Montura commented Sep 4, 2024

In PR for HDDS-9889 we discussed with Siddhant Sangwan that tests form org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask could be refactored using MockedSCM class (introduced in HDDS-9889)

Some work has been already done in:

  1. HDDS-9889. Refactor tests related to dynamical adaptation for datanode limits in ContainerBalancer #5758
  2. HDDS-10699. Refactor ContainerBalancerTask and tests in TestContainerBalancerTask #6537
  3. HDDS-10917. Refactoring more tests from TestContainerBalancerTask #6734

What changes were proposed in this pull request?

  1. Refactor some tests from org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11410

How was this patch tested?

Use standalone tests

@Montura
Copy link
Contributor Author

Montura commented Sep 4, 2024

@Tejaskriya, take a look please

Copy link
Contributor

@Tejaskriya Tejaskriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the PR seems good to me, just a small comment below.

if (nodeCount < DATANODE_COUNT_LIMIT_FOR_SMALL_CLUSTER) {
config.setMaxDatanodesPercentageToInvolvePerIteration(100);
}
config.setThreshold(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the threshold has been removed here and in few more tests below, but in the newly introduced ContainerBalancerConfigBuilder, we aren't setting it. Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tejaskriya , thank for a review. I forgot about threshold thanks, I'll fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ready

@adoroszlai
Copy link
Contributor

/pending set threshold in tests

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking this issue as un-mergeable as requested.

Please use /ready comment when it's resolved.

Please note that the PR will be closed after 21 days of inactivity from now. (But can be re-opened anytime later...)

set threshold in tests

@Montura
Copy link
Contributor Author

Montura commented Nov 7, 2024

@adoroszlai , you mean explicitly write config.setThreshold(...) in each test case?

@adoroszlai
Copy link
Contributor

@Montura no, I only meant that based on this conversation something needs to be done about setting threshold. If you have addressed that, please reply /ready to remove the pending label.

@Montura
Copy link
Contributor Author

Montura commented Nov 7, 2024

/ready

@github-actions github-actions bot dismissed their stale review November 7, 2024 16:14

Blocking review request is removed.

@github-actions github-actions bot removed the pending label Nov 7, 2024
Copy link
Contributor

@Tejaskriya Tejaskriya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for the patch @Montura

@adoroszlai
Copy link
Contributor

Looks like this increases test run time from:

[INFO] Tests run: 240, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.125 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.686 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

to:

[INFO] Tests run: 304, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.262 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.735 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

So 2x for TestContainerBalancerDatanodeNodeLimit and x/2 for TestContainerBalancerTask, but the problem is that the increase is an order of magnitude more.

The following test cases take more than 1 second:

org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[1]  Time elapsed: 1.028 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[2]  Time elapsed: 1.024 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[3]  Time elapsed: 1.024 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[4]  Time elapsed: 1.025 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[5]  Time elapsed: 1.025 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[6]  Time elapsed: 1.025 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[1]  Time elapsed: 1.026 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[2]  Time elapsed: 1.024 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[3]  Time elapsed: 1.029 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[4]  Time elapsed: 1.026 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[5]  Time elapsed: 1.029 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[6]  Time elapsed: 1.03 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[7]  Time elapsed: 1.035 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[8]  Time elapsed: 1.05 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[9]  Time elapsed: 1.039 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[10]  Time elapsed: 1.078 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[11]  Time elapsed: 1.046 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[12]  Time elapsed: 1.053 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[13]  Time elapsed: 1.062 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[14]  Time elapsed: 1.074 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[15]  Time elapsed: 1.094 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[16]  Time elapsed: 1.157 s

@Montura
Copy link
Contributor Author

Montura commented Nov 19, 2024

Looks like this increases test run time from:

[INFO] Tests run: 240, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.125 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.686 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

to:

[INFO] Tests run: 304, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.262 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.735 s - in org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerTask

So 2x for TestContainerBalancerDatanodeNodeLimit and x/2 for TestContainerBalancerTask, but the problem is that the increase is an order of magnitude more.

The following test cases take more than 1 second:

org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[1]  Time elapsed: 1.028 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[2]  Time elapsed: 1.024 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[3]  Time elapsed: 1.024 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[4]  Time elapsed: 1.025 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[5]  Time elapsed: 1.025 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultException(MockedSCM)[6]  Time elapsed: 1.025 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[1]  Time elapsed: 1.026 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[2]  Time elapsed: 1.024 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[3]  Time elapsed: 1.029 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[4]  Time elapsed: 1.026 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[5]  Time elapsed: 1.029 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[6]  Time elapsed: 1.03 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[7]  Time elapsed: 1.035 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[8]  Time elapsed: 1.05 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[9]  Time elapsed: 1.039 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[10]  Time elapsed: 1.078 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[11]  Time elapsed: 1.046 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[12]  Time elapsed: 1.053 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[13]  Time elapsed: 1.062 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[14]  Time elapsed: 1.074 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[15]  Time elapsed: 1.094 s
org.apache.hadoop.hdds.scm.container.balancer.TestContainerBalancerDatanodeNodeLimit.checkIterationResultTimeout(MockedSCM)[16]  Time elapsed: 1.157 s

There are 3 possible solutions:

  1. to reduce the total amount of arguments for checkIterationResultTimeout and checkIterationResultException tests.
  2. to decrease the timeout time limit
  3. Do items 1 and 2 simultaneously

@adoroszlai
Copy link
Contributor

There are 3 possible solutions:

  1. to reduce the total amount of arguments for checkIterationResultTimeout and checkIterationResultException tests.
  2. to decrease the timeout time limit
  3. Do items 1 and 2 simultaneously

Can you please explore how much timeout can be decreased without introducing flakiness? Depending on that, we'd need to go with one of the solutions.

- initializeIterationShouldUpdateUnBalancedNodesWhenThresholdChanges (reduce iterations 50 -> 10
- checkIterationResultTimeout (increase maxEnteringSize, reduce timeouts)
@Montura
Copy link
Contributor Author

Montura commented Nov 23, 2024

UPD:

  • initializeIterationShouldUpdateUnBalancedNodesWhenThresholdChanges : reduce iterations 50 -> 10
  • checkIterationResultTimeout : increase maxEnteringSize, decrease timeouts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants