Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-domain mesh creation #1332

Open
tmarrinan opened this issue Oct 31, 2024 · 11 comments
Open

Multi-domain mesh creation #1332

tmarrinan opened this issue Oct 31, 2024 · 11 comments
Labels

Comments

@tmarrinan
Copy link

Hello.
I have data that is distributed amongst N processes and I want to create a blueprint mesh for it. I thought I was doing it correctly, but when I call the partition function, I am getting unexpected results. I'm not sure if I am creating the mesh wrong or calling the partition function wrong.
Any assistance would be appreciated!

Example (12 processes, each owning a 4x4 subregion of an overall 16x12 grid):

+-----+-----+-----+-----+
|  0  |  1  |  2  |  3  |
|     |     |     |     |
+-----+-----+-----+-----+
|  4  |  5  |  6  |  7  |
|     |     |     |     |
+-----+-----+-----+-----+
|  8  |  9  | 10  | 11  |
|     |     |     |     |
+-----+-----+-----+-----+

Code:

int rows = 3;
int columns = 4;
int local_width = 4;
int local_height = 4;
int row = rank / columns;
int column = rank % columns;
int origin_x = local_width * column;
int origin_y = local_height * row;

double values[16] = {...};

conduit::Node mesh;
mesh["state/domain_id"] = process_id;

mesh["coordsets/coords/type"] = "uniform";
mesh["coordsets/coords/dims/i"] = local_width;
mesh["coordsets/coords/dims/j"] = local_height;

mesh["coordsets/coords/origin/x"] = origin_x;
mesh["coordsets/coords/origin/y"] = origin_y;
mesh["coordsets/coords/spacing/dx"] = 1;
mesh["coordsets/coords/spacing/dy"] = 1;

mesh["topologies/topo/type"] = "uniform";
mesh["topologies/topo/coordset"] = "coords";

mesh["fields/scalar1/association"] = "vertex";
mesh["fields/scalar1/topology"] = "topo";
mesh["fields/scalar1/values"].set(values, 16);

I then want to repartition the mesh to access the whole thing on process 0. So I tried the following:

conduit::Node options, selections, output;
conduit::Node &selection = selections.append();
selection["type"] = "logical";
selection["start"] = {0u, 0u, 0u}; // for some reason this failed if I only used 2 dimensions
selection["end"] = {16u, 12u, 1u}; // for some reason this failed if I only used 2 dimensions
options["target"] = 1;
options["selections"] = selections;

conduit::blueprint::mpi::mesh::partition(mesh, options, output, MPI_COMM_WORLD);

However, the resulting output mesh still only has size 4x4 and only contains the data from process 0.

As a side note, I am setting "target" to 1 (specifying 1 process), but how do I specify which process (i.e. what if I want it on process 3 instead of process 1)?

@tmarrinan
Copy link
Author

OK - after a bit more reading and testing - I think I have it working!

There were 2 key things I needed to change (one with the mesh and one with the partition options):

  1. Change scalar value association from "vertex" to "element" (this also meant that the coordinate dims needed to be increased by 1)
  2. Have an array of selections (one per process), adding a proper "domain_id" and using start and end values that match the local data

Final solution:

conduit::Node mesh;
mesh["state/domain_id"] = process_id;

mesh["coordsets/coords/type"] = "uniform";
mesh["coordsets/coords/dims/i"] = local_width + 1;
mesh["coordsets/coords/dims/j"] = local_height + 1;

mesh["coordsets/coords/origin/x"] = origin_x;
mesh["coordsets/coords/origin/y"] = origin_y;
mesh["coordsets/coords/spacing/dx"] = 1;
mesh["coordsets/coords/spacing/dy"] = 1;

mesh["topologies/topo/type"] = "uniform";
mesh["topologies/topo/coordset"] = "coords";

mesh["fields/scalar1/association"] = "element";
mesh["fields/scalar1/topology"] = "topo";
mesh["fields/scalar1/values"].set(values, 16);

int i;
conduit::Node options, selections, output;
for (i = 0; i < num_processes; i++)
{
    conduit::Node &selection = selections.append();
    selection["type"] = "logical";
    selection["domain_id"] = i;
    selection["start"] = {0u, 0u, 0u};
    selection["end"] = {local_width, local_height, 1u};
}
options["target"] = 1;
options["fields"] = {"scalar1"};
options["selections"] = selections;
options["mapping"] = 0;

conduit::blueprint::mpi::mesh::partition(mesh, options, output, MPI_COMM_WORLD);

This resulted in the following output (each process filled its local data with floating point values equal to its process id):

state:
  domain_id: 0
coordsets:
  coords:
    type: "uniform"
    origin:
      x: 0.0
      y: 0.0
    dims:
      i: 17
      j: 13
topologies:
  topo:
    type: "uniform"
    coordset: "coords"
fields:
  scalar1:
    topology: "topo"
    association: "element"
    values: [0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0, 8.0, 8.0, 8.0, 8.0, 9.0, 9.0, 9.0, 9.0, 10.0, 10.0, 10.0, 10.0, 11.0, 11.0, 11.0, 11.0]
  original_element_ids:
    topology: "topo"
    association: "element"
    values:
      domains: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      ids: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 8, 9, 10, 11, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15]
  original_vertex_ids:
    topology: "topo"
    association: "vertex"
    values:
      domains: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      ids: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 14, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 19, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 14, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 19, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 10, 11, 12, 13, 14, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 15, 16, 17, 18, 19, 20, 21, 22, 23, 20, 21, 22, 23, 20, 21, 22, 23, 20, 21, 22, 23, 24]

@JustinPrivitera
Copy link
Member

I'm glad you got this working. Do you have suggestions on how we can improve the documentation?

@tmarrinan
Copy link
Author

The tricky part was realizing that which data came from which domain_id needed to be manually selected using an array of "selections" rather than just specifying the desired region and letting Conduit determine who owned that data.

There were no examples in the documentation with multiple selections, so it was a bit of trial-and-error. Having the code that matches the M:N redistribution in the picture (where target is 10, 4, and 2) might be helpful.

@tmarrinan
Copy link
Author

Well, now I'm running into a different issue.
If my data contains "ghost cells" (border cells that contain data from a neighbor), then I am receiving the following warning when repartitioning: Unable to combine domains as uniform, using unstructured.

In the example above:

+-----+-----+-----+-----+
|  0  |  1  |  2  |  3  |
|     |     |     |     |
+-----+-----+-----+-----+
|  4  |  5  |  6  |  7  |
|     |     |     |     |
+-----+-----+-----+-----+
|  8  |  9  | 10  | 11  |
|     |     |     |     |
+-----+-----+-----+-----+

I now have each process with ghost cells for its neighbors. This means the actual data size for each process is as follows (when overall grid is 16x12):

  • 0: 5x5 (0 ghost cells left, 1 ghost cell right, 0 ghost cells above, 1 ghost cell below)
  • 1: 6x5 (1 ghost cell left, 1 ghost cell right, 0 ghost cells above, 1 ghost cell below)
  • 2: 6x5 (1 ghost cell left, 1 ghost cell right, 0 ghost cells above, 1 ghost cell below)
  • 3: 5x5 (1 ghost cell left, 0 ghost cells right, 0 ghost cells above, 1 ghost cell below)
  • 4: 5x6 (0 ghost cells left, 1 ghost cell right, 1 ghost cell above, 1 ghost cell below)
  • 5: 6x6 (1 ghost cell left, 1 ghost cell right, 1 ghost cell above, 1 ghost cell below)
  • 6: 6x6 (1 ghost cell left, 1 ghost cell right, 1 ghost cell above, 1 ghost cell below)
  • 7: 5x6 (1 ghost cell left, 0 ghost cell right, 1 ghost cell above, 1 ghost cell below)
  • 8: 5x5 (0 ghost cells left, 1 ghost cell right, 1 ghost cell above, 0 ghost cells below)
  • 9: 6x5 (1 ghost cell left, 1 ghost cell right, 1 ghost cell above, 0 ghost cells below)
  • 10: 6x5 (1 ghost cell left, 1 ghost cell right, 1 ghost cell above, 0 ghost cells below)
  • 11: 5x5 (1 ghost cell left, 0 ghost cells right, 1 ghost cell above, 0 ghost cells below)

Accordingly, I update my "start" and "end" in each selection to account for the desired data sometimes being 1 cell to the right or down. I also update the "origin/{i,j}" of the coordset in the mesh.

Any ideas why the uniform domain cannot be maintained?

@tmarrinan
Copy link
Author

Wait, nevermind... I just realized that "end" is inclusive. It didn't matter without the ghost cells, since it would get cropped to the data size, but I was now grabbing the ghost cells to the right / below since I assumed "end" was exclusive

@JustinPrivitera
Copy link
Member

Conduit Blueprint currently has no notion of ghost cells or nodes, but that support will likely be added in the future.

@JustinPrivitera
Copy link
Member

We should enhance the documentation for partitioning and provide more and better examples.

@tmarrinan
Copy link
Author

Hello!
I have one more question relating to partitioning. I am now attempting to accomplish the same thing, but using Python instead of C++. I don't see many examples, but when I try output = conduit.blueprint.mpi.mesh.partition(mesh, options, comm), I get an error about conduit blueprint not having a member named "mpi".
How could I achieve the same thing in Python?

@JustinPrivitera
Copy link
Member

@cyrush can correct me if I'm wrong, but I don't believe MPI is enabled for the python interface for blueprint. I'm not sure why that's the case. We should add it.

@cyrush
Copy link
Member

cyrush commented Nov 18, 2024

read of the situation is correct, we can add that support.

@JustinPrivitera
Copy link
Member

#1333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants