HDDS-11243. SCM SafeModeRule Support EC. #7008

slfan1989 · 2024-07-31T02:02:27Z

What changes were proposed in this pull request?

We aim for SCM to immediately switch to leader once it exits safe mode. Currently, due to certain issues, we need to wait for at least one full container report from a DataNode before proceeding with the switch.

Currently, SCM SafeMode has the following issues:

Issue1

DataNodeSafeModeRule cannot effectively verify the registration status of DataNodes. In most cases, as long as there are more than one DataNode, this rule passes. Therefore, we need to strengthen this rule.

Issue2

ContainerSafeModeRule does not support verification of EC (Erasure Coding) Containers. EC Containers differ significantly from RATIS/THREE Containers because EC Containers require determining how many replicas are needed based on the EC type. For instance, for EC-6-3-1024K, we need to ensure that the Container reports having all 6 replicas before it can provide services.

This PR aims to enhance and improve the above two points.

For code Improve:

Enhance DataNodeSafeModeRule

For the registration of Datanodes, we need to obtain the complete list of Datanodes from SCM. This list can be retrieved from the Pipeline. I pass PipelineManager as a parameter into DataNodeSafeModeRule to calculate the number of Datanodes.

Enhance ContainerSafeModeRule

Enhance replica validation for EC containers. Obtain the required replicas based on ECReplicationConfig. Consider container reporting complete only when sufficient replicas have been reported.
Modify the message sending location of ContainerSafeModeRule.

ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMDatanodeProtocolServer.java

Lines 241 to 256 in 311245b

    
           // TODO : Return the list of Nodes that forms the SCM HA. 
        
           RegisteredCommand registeredCommand = scm.getScmNodeManager() 
        
               .register(datanodeDetails, nodeReport, pipelineReportsProto, 
        
                   layoutInfo); 
        
           if (registeredCommand.getError() 
        
               == SCMRegisteredResponseProto.ErrorCode.success) { 
        
             eventPublisher.fireEvent(CONTAINER_REPORT, 
        
                 new SCMDatanodeHeartbeatDispatcher.ContainerReportFromDatanode( 
        
                     datanodeDetails, containerReportsProto)); 
        
             eventPublisher.fireEvent(SCMEvents.NODE_REGISTRATION_CONT_REPORT, 
        
                 new NodeRegistrationContainerReport(datanodeDetails, 
        
                     containerReportsProto)); 
        
             eventPublisher.fireEvent(PIPELINE_REPORT, 
        
                     new PipelineReportFromDatanode(datanodeDetails, 
        
                             pipelineReportsProto)); 
        
           }

There are some issues in this part of the code. The handling of NODE_REGISTRATION_CONT_REPORT and CONTAINER_REPORT is asynchronous. There is a scenario where NODE_REGISTRATION_CONT_REPORT processing completes, but CONTAINER_REPORT processing does not. This still leads to insufficient EC replicas issue.

I adjusted the sending position of NODE_REGISTRATION_CONT_REPORT (requiring the message to be sent only after CONTAINER_REPORT processing completes) and introduced a new type, CONTAINER_REGISTRATION_REPORT, to distinguish it.

Page display:

What is the link to the Apache JIRA

JIRA: HDDS-11243: SCM SafeModeRule Support EC.

How was this patch tested?

Junit Test & Production environment validation

slfan1989 · 2024-08-04T15:03:05Z

...ds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReportHandler.java

@@ -199,6 +201,11 @@ public void onMessage(final ContainerReportFromDatanode reportFromDatanode,
        // list
        processMissingReplicas(datanodeDetails, expectedContainersInDatanode);
        containerManager.notifyContainerReportProcessing(true, true);
+        if (reportFromDatanode.isRegister()) {


After the CONTAINER_REPORT is completed, we send the message to CONTAINER_REGISTRATION_REPORT to ensure that the container count is accurate.

slfan1989 · 2024-08-06T09:25:49Z

@errose28 @siddhantsangwan Can you help review this pr? Thank you very much!

siddhantsangwan · 2024-08-12T08:50:43Z

@slfan1989 Thanks for taking this up, I was earlier thinking of fixing this myself. I'll review the PR soon.

slfan1989 · 2024-08-22T00:26:28Z

@siddhantsangwan Can you help review this pr? Thank you very much! The unit test errors are not caused by our changes.

siddhantsangwan

@slfan1989 I've reviewed this partly. Have some comments below.

siddhantsangwan · 2024-09-09T10:06:28Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

+          if (replicationConfig != null && replicationConfig instanceof ECReplicationConfig) {
+            ECReplicationConfig ecReplicationConfig = (ECReplicationConfig) replicationConfig;
+            int data = ecReplicationConfig.getData();
+            if (uuids != null && uuids.size() > data) {


For Ratis, just one replica per container is required. So for EC, data number of Datanodes should be sufficient. What do you think?

You are right. For EC, the amount of data we have is already sufficient. I will improve the code.

siddhantsangwan · 2024-09-09T10:11:07Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

+      if (ratisContainerMap.containsKey(containerID)) {
+        ratisContainerDNsMap.computeIfAbsent(containerID, key -> Sets.newHashSet());
+        ratisContainerDNsMap.get(containerID).add(datanodeUUID);
+        if (!reportedConatinerIDSet.contains(containerID)) {
+          Set<UUID> uuids = ratisContainerDNsMap.get(containerID);
+          if (uuids != null && uuids.size() >= 1) {
+            ratisContainerWithMinReplicas.getAndAdd(1);
+            reportedConatinerIDSet.add(containerID);
+            getSafeModeMetrics()
+                .incCurrentContainersWithOneReplicaReportedCount();
+          }
+        }
+      }


I didn't really understand this change. It seems correct, but is there any reason this logic isn't the same as before? Why do we need to track Datanodes in a set for Ratis containers? Is it because ratisContainerDNsMap and reportedConatinerIDSet are going to be used somewhere else as well? Or is it done this way just so it's similar to the EC logic?

Thank you for the question!

I didn't really understand this change. It seems correct, but is there any reason this logic isn't the same as before?

The previous logic was correct. I made this modification for two reasons:

To align with the EC's logic and improve code readability.

To facilitate the retrieval of additional data in future pr. For example, this will allow users not only to understand the progress but also to identify which containers have not reported and which DataNodes are included in the reported containers.

Why do we need to track Datanodes in a set for Ratis containers? Is it because ratisContainerDNsMap and reportedConatinerIDSet are going to be used somewhere else as well?

The type of ratisContainerDNsMap is Map<Long, Set<UUID>>, where the key is the ContainerId. The reason for using a Set as the value is to avoid retaining duplicate DN information, as we may encounter the same DN registering multiple times.

Or is it done this way just so it's similar to the EC logic?

Here's one reason; it has already been explained in the previous comment.

Can we modify it this way? The original code contains some insufficient information.

@siddhantsangwan Can you help review this PR again? Thank you very much!

I improved some of the code, made it less repetitive, and added some comments.

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

siddhantsangwan · 2024-09-09T11:01:51Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

+    long ratisCutOff = (long) Math.ceil(ratisMaxContainer * safeModeCutoff);
+    long ecCutOff = (long) Math.ceil(ecMaxContainer * safeModeCutoff);
+
+    getSafeModeMetrics().setNumContainerWithOneReplicaReportedThreshold(ratisCutOff);


Let's set EC metrics as well.

I will improve this part of the code.

siddhantsangwan · 2024-09-09T11:02:15Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

  private void reInitializeRule() {
-    containerMap.clear();
+


Looks like most of the code inside this method is the same as before. If possible, let's refactor this to avoid repetition.

I will improve this part of the code.

siddhantsangwan · 2024-09-09T11:04:31Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/SafeModeMetrics.java

@@ -75,10 +79,18 @@ public void setNumContainerWithOneReplicaReportedThreshold(long val) {
    this.numContainerWithOneReplicaReportedThreshold.set(val);
  }

+  public void setNumContainerWithECDataReplicaReportedThreshold(long val) {
+    this.numContainerWithECDataReplicaReportedThreshold.incr(val);


Should use set() instead of incr().

slfan1989 · 2024-09-11T00:03:03Z

@slfan1989 I've reviewed this partly. Have some comments below.

Thank you very much for reviewing this PR! I will respond to your questions as soon as possible.

sadanand48 · 2024-10-03T06:30:45Z

hadoop-hdds/common/src/main/resources/ozone-default.xml

@@ -1695,6 +1695,15 @@
    </description>
  </property>

+  <property>
+    <name>hdds.scm.safemode.reported.datanode.pct</name>
+    <value>0.90</value>


I think 90% is too much and a significant difference from previous config, If there is a cluster say with under-utilized DN's on which which there is no data in ~~30-40% of total DN's , Safemode would still wait for these to be registered. IMO DatanodeSafemodeRule is to ensure there are datanodes available for a write to go through. We already do the check to see if enough containers are available for reading in the containerSafemodeRule

cc @nandakumar131

Thank you very much for helping review the code! From my personal perspective, I believe we should still have an optional configuration to control this. You made a valid point—0.9 might be a relatively large value, but if only one DN is registered and the rule passes, it seems a bit too lenient. We set the default value to 0.1. Do you think that would be acceptable?

I think 0.1 sounds good, thanks

I also think that the Datanode safe mode rule is meant to ensure writes work. So that means we only need one Datanode to be present as Ozone still allows single replica writes.

slfan1989 · 2024-10-03T08:06:04Z

@errose28 @siddhantsangwan @sadanand48 @adoroszlai

Thank you all very much for paying attention to this PR!

To facilitate a better review of this PR, I'll summarize some additional information for your reference.

Background:

In our production environment, we use EC (Erasure Coding) strategy, and we have written a lot of EC data.
Sometimes we need to restart the SCM.
After the SCM restarts, it can exit safe mode quickly, but when we switch, we encounter an issue where user applications report an error stating: There are insufficient datanodes to read the EC block.

Solution Process:

Our hope is that once the SCM meets the safe mode criteria, it can switch to become the leader SCM, and users' access will no longer report errors.

We found that there are some issues with the two rules for safe mode.

ContainerSafeModeRule is missing the EC validation.

Through our familiarity and understanding of the code, we found that ContainerSafeModeRule.java does not handle EC Containers. This leads to situations where an EC Container with only one replica is reported as successful, but EC requires a rule-based assessment. For EC-6-3-1024K, we need 6 replicas to meet the criteria.

The rules in DataNodeSafeModeRule are lenient.

In DataNodeSafeModeRule, the default condition for exiting the rule is that only one DataNode is registered. I think it would be better if we could configure a proportional parameter to control when we can exit.

Therefore, we added the parameter hdds.scm.safemode.reported.datanode.pct and calculated the actual required number of registered DataNodes based on the number of DataNodes stored in the pipeline.

This is an example of the actual usage effect.

slfan1989 · 2024-10-03T08:16:20Z

...-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/DataNodeSafeModeRule.java

+    if (pipeLineDnSet.contains(dnUUID) || !registeredDnSet.contains(dnUUID)) {
+      registeredDnSet.add(dnUUID);
+      registeredDns = registeredDnSet.size();
+      unRegisteredDn.remove(dnUUID);


@errose28 Regarding the issue we discussed together in HDDS-11481, I plan to add a variable here to store unRegisteredDn and display it in the status. Do you think this approach is acceptable?

This is using the the persisted pipeline membership to determine which nodes have not been seen yet? That should work. It won't catch the cases where a new DN not in any pipelines has not registered yet but it at least provides more information.

I'm not sure that putting all nodes back in the unregistered list on refresh it the correct behavior though, since nodes that have already registered should remain accounted for by the rule on refresh.

Thank you for your message! I have made adjustments to this part of the logic. During re-registration, only DNs that are not in the registeredDnSet will be placed in the unRegisteredDn.

slfan1989 · 2024-10-04T13:27:26Z

@siddhantsangwan A friendly ping! Supporting EC Container recognition is crucial for SafeMode. I’d like to continue contributing to gain recognition for this change. What additional tasks can I pursue? I would appreciate any guidance or suggestions you can provide.

siddhantsangwan · 2024-10-08T18:30:40Z

@slfan1989 thanks for updating! I'll be able to review this and have a discussion with you next week, probably on Tuesday.

siddhantsangwan · 2024-10-16T11:02:06Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

+   * @param isEcContainer true, means ECContainer, false, means not ECContainer.
+   */
+  private void recordReportedContainer(long containerID, boolean isEcContainer) {
+    if (!reportedContainerIDSet.contains(containerID)) {


Is it possible to get rid of reportedContainerIDSet and just use the ratis and ec maps?

Thank you for your suggestion! We can indeed remove reportedContainerIDSet, and I will improve the code.

siddhantsangwan · 2024-10-16T11:04:18Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

+    return 1;
+  }
+
+  private void initContainerDNsMap(long containerID, Map<Long, Set<UUID>> containerDNsMap,


How about renaming this to putInContainerDNsMap?

I will improve the code.

siddhantsangwan · 2024-10-16T12:42:42Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

-
-  private AtomicLong containerWithMinReplicas = new AtomicLong(0);
+  private Set<Long> reportedContainerIDSet = new HashSet<>();
+  private Map<Long, ContainerInfo> ratisContainerMap;


Do we really ratisContainerMap and ecContainerMap to be maps at all? As far as I can see, all we need are the container IDs. We can then just use container manager to get the container info object when needed. There's probably at least a couple of GBs of overhead for storing references to billions of ContainerInfo objects in the map, which we don't really need. It can just be a set of container IDs.

Going a step further, I feel like we don't need the List of ContainerInfo objects that's being passed into the constructor of this class. Ultimately, all we need is a mapping from container id to container info for all container IDs. So the constructor should either have that as an argument, or just the container manager, since the container manager can simply be used to get any information we need about the containers in the system.

This suggestion is also very reasonable. I’ve been using ContainerInfo solely to retrieve the minimum replica count of the container. I can optimize ratisContainerMap and ecContainerMap so that these two variables only store the mapping between ContainerID and its corresponding minimum replica count.

siddhantsangwan · 2024-10-16T12:46:41Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

+      ReplicationConfig replicationConfig = container.getReplicationConfig();
+
+      if (checkContainerState(containerState) && container.getNumberOfKeys() > 0) {
+        if (replicationConfig instanceof RatisReplicationConfig) {


It's more intuitive to do something like:

container.getReplicationType().equals(HddsProtos.ReplicationType.RATIS)

siddhantsangwan · 2024-10-16T12:48:30Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

      }
    }
  }

-  private void reInitializeRule() {
-    containerMap.clear();


This method was also clearing this map but the new code isn't; can you check if we need to clear the map?

I carefully readed the code, and indeed we should also preserve the logic for cleaning up the Map. I have added the relevant logic in the initializeRule method.

adoroszlai · 2024-11-04T07:37:12Z

Thanks @slfan1989 for working on this.

I will add unit tests and continue to improve the code.

Converted the PR to draft until then.

Also removed some from requested reviewer list. I don't think having 8 people review the same patch makes sense.

slfan1989 · 2024-11-11T10:30:19Z

@siddhantsangwan Thank you very much for reviewing the code! I have made the changes according to your suggestions and added unit tests. The unit tests cover common EC types, such as EC-3-2-1024K and EC-6-3-1024K. I would appreciate it if you could find some time to review this PR again.

I have submitted the code to my personal repository, and the CI(https://github.com/slfan1989/ozone/actions/runs/11771095727) shows that all checks have passed. I have now changed the status of this PR to "Ready for Review."

adoroszlai · 2024-11-17T20:15:57Z

@nandakumar131 can you please review as well?

slfan1989 · 2024-11-19T14:11:13Z

@nandakumar131 can you please review as well?

@adoroszlai Thank you very much for reviewing this PR! This improvement is very important to us. Currently, when we restart the SCM, it cannot determine whether the EC Container has finished reporting because, similar to the Ratis 3-replica Container, the SCM considers the Container ready as soon as just one replica reports successfully. This results in an issue where we are unable to promote the SCM to leader when it has just restarted and has already exited safe mode.

This PR has been in use internally for several months, and I personally believe it has met expectations. Currently, we have fully transitioned our internal Ozone cluster to the EC-6-3-1024K strategy (meaning there is almost no 3-replica data in the cluster, with only a small amount, less than 10PB, as exceptions). This decision was driven by cost considerations, as we have already stored over 100PB of data.

I sincerely hope we can continue to push this PR forward. If there are any suggestions for improvement, I will continue to make the necessary changes.

cc: @siddhantsangwan @sadanand48 @errose28

adoroszlai

I would like to suggest relatively simple code changes to reduce duplication in ContainerSafeModeRule: create an instance for each replication type. See: adoroszlai@fb815f3

slfan1989 · 2024-11-20T04:49:13Z

I would like to suggest relatively simple code changes to reduce duplication in ContainerSafeModeRule: create an instance for each replication type. See: adoroszlai@fb815f3

@adoroszlai Thank you very much for the code modifications you provided! I am currently reviewing this part of the code and optimizing the PR based on your suggestions.

siddhantsangwan

@slfan1989 thanks for your sustained efforts. I've left some more comments below, mostly regarding maintaining unnecessary data in memory. At high scale, these memory optimisations will make a big difference!

siddhantsangwan · 2024-11-22T11:57:00Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

-  private double maxContainer;
-
-  private AtomicLong containerWithMinReplicas = new AtomicLong(0);
+  private Map<Long, Integer> ratisContainerMinReplicaMap;


This map is not needed as far as I can tell. It can just be a Set of Ratis container ids (long), since the min replica count for ratis containers is always 1.

Thank you for the suggestion! I have improved the relevant code.

siddhantsangwan · 2024-11-22T11:59:01Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

-
-  private AtomicLong containerWithMinReplicas = new AtomicLong(0);
+  private Map<Long, Integer> ratisContainerMinReplicaMap;
+  private Map<Long, Set<UUID>> ratisContainerDNsMap;


Why maintain a mapping for ratis containers? If we use the set of container id that I mentioned above, we can simply remove a container id from the set when a datanode reports having a replica of that container.

I have improved the code based on the method you suggested.

siddhantsangwan · 2024-11-22T12:05:55Z

...hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/ContainerSafeModeRule.java

-  private AtomicLong containerWithMinReplicas = new AtomicLong(0);
+  private Map<Long, Integer> ratisContainerMinReplicaMap;
+  private Map<Long, Set<UUID>> ratisContainerDNsMap;
+  private Map<Long, Integer> ecContainerMinReplicaMap;


We don't need to maintain this mapping either, as far as I can tell. Whenever we have the container id and need the replication factor of that container, we can use a container manager method to get it. That'll be a constant time (O(1)) lookup for container manager on average.

This part of the logic has also been improved.

siddhantsangwan · 2024-11-25T08:53:30Z

@slfan1989 I'm not sure about the Datanode safe mode rule related improvements in this pull request. Logically it's a separate change from adding EC safe mode support, and so it should have a different jira and pull request. I feel it requires more thinking and I'm not sure how carefully others have reviewed it. It's best suited for a different PR.

So in the interest of time, I suggest removing those changes from this PR and introducing them in a separate PR. That way, we'll be able to merge this PR sooner.

siddhantsangwan

The latest commits look good to me. With this, all the EC container safe mode related changes are ready to be merged IMO. If we can remove the datanode rule related changes and have a green CI run, we'll be able to merge the PR.

slfan1989 · 2024-11-25T09:20:43Z

The latest commits look good to me. With this, all the EC container safe mode related changes are ready to be merged IMO. If we can remove the datanode rule related changes and have a green CI run, we'll be able to merge the PR.

@siddhantsangwan Thank you for your continued improvement suggestions! I will remove the datanode rule related changes in this PR.

siddhantsangwan · 2024-11-25T10:43:37Z

@slfan1989 test failure looks related, can you take a look? Also please let me know once it's ready for a final review.

slfan1989 · 2024-11-25T11:12:33Z

@slfan1989 test failure looks related, can you take a look? Also please let me know once it's ready for a final review.

@siddhantsangwan I have fixed the errors in the unit tests and am waiting for the CI to pass. Once that’s done, I will ask you to help with another review. Thank you again!

slfan1989 · 2024-11-25T15:35:37Z

@siddhantsangwan I have rechecked the code, and this version is the final one. I have also rebased the code. Could you please review it again? Thank you very much!

adoroszlai · 2024-11-25T15:44:50Z

Please try to avoid force-push when updating the PR. Here are some great articles that explain why:

https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing
https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

slfan1989 · 2024-11-25T15:51:00Z

Please try to avoid force-push when updating the PR. Here are some great articles that explain why:

https://developers.mattermost.com/blog/submitting-great-prs/#4-avoid-force-pushing https://www.freecodecamp.org/news/optimize-pull-requests-for-reviewer-happiness#request-a-review

@adoroszlai Thank you for providing this information! I will pay attention to this detail in future development to avoid issues caused by force-pushing.

siddhantsangwan

Some comments on the tests.

...dds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/safemode/TestSCMSafeModeManager.java

siddhantsangwan

LGTM, pending green CI.

siddhantsangwan · 2024-11-26T13:32:15Z

Merged, thanks everyone!

slfan1989 · 2024-11-26T14:01:46Z

Thank you all for your support in helping us complete this PR. I greatly appreciate everyone’s valuable feedback throughout the process. @siddhantsangwan's professionalism left a strong impression on me—many thanks once again. I’d also like to extend my gratitude to @adoroszlai for their continued assistance. I’ve learned a lot throughout this process.

slfan1989 marked this pull request as draft July 31, 2024 02:15

slfan1989 mentioned this pull request Jul 31, 2024

HDDS-10985. EC Reconstruction failed because the size of currentChunks was not equal to checksumBlockDataChunks. #7009

Merged

errose28 mentioned this pull request Jul 31, 2024

HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache #6974

Merged

slfan1989 changed the title ~~HDDS-11243: SCM SafeModeRule Support EC.~~ HDDS-11243. SCM SafeModeRule Support EC. Aug 1, 2024

slfan1989 marked this pull request as ready for review August 4, 2024 13:04

slfan1989 commented Aug 4, 2024

View reviewed changes

slfan1989 force-pushed the HDDS-11243 branch from 871ded5 to b65a565 Compare August 5, 2024 14:54

slfan1989 force-pushed the HDDS-11243 branch from b365587 to 1b204ed Compare August 21, 2024 17:00

slfan1989 force-pushed the HDDS-11243 branch from 7fc7745 to 1e1f8c3 Compare August 29, 2024 22:36

siddhantsangwan requested review from sodonnel, adoroszlai and sadanand48 September 9, 2024 05:04

siddhantsangwan reviewed Sep 9, 2024

View reviewed changes

slfan1989 requested a review from siddhantsangwan September 24, 2024 02:32

siddhantsangwan requested review from aswinshakil, swamirishi and ashishkumar50 October 1, 2024 06:28

slfan1989 force-pushed the HDDS-11243 branch from 11005b0 to 57f59f9 Compare October 3, 2024 01:53

sadanand48 reviewed Oct 3, 2024

View reviewed changes

slfan1989 commented Oct 3, 2024

View reviewed changes

slfan1989 force-pushed the HDDS-11243 branch from 062dcc3 to 18594c8 Compare October 3, 2024 08:40

siddhantsangwan reviewed Oct 16, 2024

View reviewed changes

adoroszlai marked this pull request as draft November 4, 2024 07:33

slfan1989 force-pushed the HDDS-11243 branch 2 times, most recently from 5db5824 to 1c16cb2 Compare November 11, 2024 05:10

slfan1989 marked this pull request as ready for review November 11, 2024 10:23

slfan1989 requested a review from siddhantsangwan November 11, 2024 10:30

adoroszlai reviewed Nov 19, 2024

View reviewed changes

siddhantsangwan reviewed Nov 22, 2024

View reviewed changes

slfan1989 referenced this pull request in adoroszlai/ozone Nov 24, 2024

reduce duplication by creating two instances of ContainerSafeModeRule

fb815f3

siddhantsangwan reviewed Nov 25, 2024

View reviewed changes

HDDS-11243. SCM SafeModeRule Support EC.

8291802

slfan1989 force-pushed the HDDS-11243 branch from e3b150c to 8291802 Compare November 25, 2024 13:27

slfan1989 requested a review from siddhantsangwan November 25, 2024 15:36

siddhantsangwan reviewed Nov 26, 2024

View reviewed changes

HDDS-11243. Improve Junit Test.

366413e

siddhantsangwan approved these changes Nov 26, 2024

View reviewed changes

siddhantsangwan merged commit a99ab27 into apache:master Nov 26, 2024
40 checks passed

	// TODO : Return the list of Nodes that forms the SCM HA.
	RegisteredCommand registeredCommand = scm.getScmNodeManager()
	.register(datanodeDetails, nodeReport, pipelineReportsProto,
	layoutInfo);
	if (registeredCommand.getError()
	== SCMRegisteredResponseProto.ErrorCode.success) {
	eventPublisher.fireEvent(CONTAINER_REPORT,
	new SCMDatanodeHeartbeatDispatcher.ContainerReportFromDatanode(
	datanodeDetails, containerReportsProto));
	eventPublisher.fireEvent(SCMEvents.NODE_REGISTRATION_CONT_REPORT,
	new NodeRegistrationContainerReport(datanodeDetails,
	containerReportsProto));
	eventPublisher.fireEvent(PIPELINE_REPORT,
	new PipelineReportFromDatanode(datanodeDetails,
	pipelineReportsProto));
	}

HDDS-11243. SCM SafeModeRule Support EC. #7008

HDDS-11243. SCM SafeModeRule Support EC. #7008

Conversation

slfan1989 commented Jul 31, 2024 • edited Loading

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Choose a reason for hiding this comment

slfan1989 commented Aug 6, 2024

siddhantsangwan commented Aug 12, 2024

slfan1989 commented Aug 22, 2024 • edited Loading

siddhantsangwan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slfan1989 commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slfan1989 commented Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slfan1989 commented Oct 4, 2024

siddhantsangwan commented Oct 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adoroszlai commented Nov 4, 2024

slfan1989 commented Nov 11, 2024

adoroszlai commented Nov 17, 2024

slfan1989 commented Nov 19, 2024

adoroszlai left a comment

Choose a reason for hiding this comment

slfan1989 commented Nov 20, 2024

siddhantsangwan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddhantsangwan Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddhantsangwan commented Nov 25, 2024

siddhantsangwan left a comment

Choose a reason for hiding this comment

slfan1989 commented Nov 25, 2024

siddhantsangwan commented Nov 25, 2024

slfan1989 commented Nov 25, 2024

slfan1989 commented Nov 25, 2024

adoroszlai commented Nov 25, 2024

slfan1989 commented Nov 25, 2024

siddhantsangwan left a comment

Choose a reason for hiding this comment

siddhantsangwan left a comment

Choose a reason for hiding this comment

siddhantsangwan commented Nov 26, 2024

slfan1989 commented Nov 26, 2024

slfan1989 commented Jul 31, 2024 •

edited

Loading

slfan1989 commented Aug 22, 2024 •

edited

Loading

slfan1989 commented Oct 3, 2024 •

edited

Loading

siddhantsangwan Nov 22, 2024 •

edited

Loading