Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the paper, why FPS in Table 1 is only 24.5? #46

Open
gwxxx opened this issue Sep 4, 2024 · 24 comments
Open

In the paper, why FPS in Table 1 is only 24.5? #46

gwxxx opened this issue Sep 4, 2024 · 24 comments

Comments

@gwxxx
Copy link

gwxxx commented Sep 4, 2024

No description provided.

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 4, 2024

The FPS in Table 1 represents the inference time of the entire feed-forward network, including feature encoding, decoding and rendering time.

@gwxxx
Copy link
Author

gwxxx commented Sep 4, 2024

The FPS in Table 1 represents the inference time of the entire feed-forward network, including feature encoding, decoding and rendering time.

Thanks for your reply! So in Table 3, the FPS only includes rendering time?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 4, 2024

Yes, because for the per-scene optimization (Table 3), we discard the network part and only need to optimize the Gaussian point cloud, in which case the FPS is the rendering speed.

@gwxxx
Copy link
Author

gwxxx commented Sep 4, 2024

Yes, because for the per-scene optimization (Table 3), we discard the network part and only need to optimize the Gaussian point cloud, in which case the FPS is the rendering speed.

Do you mean for the per-scene optimization, the network part is not be used? In my previous understanding, for a new scene optimization, it needs to go through the network first and then be optimized.

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 4, 2024

The pre-trained generalizable model provides a point cloud as the initialization of 3DGS (per-scene optimization). Like 3DGS, we only optimize the 3D Gaussians, not the feed-forward network.

@gwxxx
Copy link
Author

gwxxx commented Sep 4, 2024

The pre-trained generalizable model provides a point cloud as the initialization of 3DGS (per-scene optimization). Like 3DGS, we only optimize the 3D Gaussians, not the feed-forward network.

So for Table 1, a new view is generated by this pipeline: input 2 or 3 images -> network -> 3d gaussian point cloud -> render a new view?
And for Table 3, the pipeline is: input all images -> network -> initial point cloud -> optimization -> 3d gaussian point cloud -> render all new views?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 4, 2024

Yes, you are right!

@gwxxx
Copy link
Author

gwxxx commented Sep 4, 2024

Yes, you are right!

Thanks!another question is why only use 2 or 3 images for the first pipeline? Is it because using all images will get a much lower FPS?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 4, 2024

A lot of images as input will definitely slow down the FPS. Actually, our method mainly focuses on sparse (few-shot) view reconstruction.

@gwxxx
Copy link
Author

gwxxx commented Sep 4, 2024

A lot of images as input will definitely slow down the FPS. Actually, our method mainly focuses on sparse (few-shot) view reconstruction.

Can I use this pipeline: input all images -> network -> 3d gaussian point cloud -> render all new views?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 4, 2024

I think that's a great idea! You can try it.
If you encounter any questions in the future, feel free to contact us.

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

I think that's a great idea! You can try it. If you encounter any questions in the future, feel free to contact us.

Thank you!

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

I want to ask about use custom data to render new views. The resolution of my data is 3840*2160, so should I change the resolution in colmap_eval.yaml?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 5, 2024

Here.
And our model requires that H and W be integer multiples of 32.

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

Here. And our model requires that H and W be integer multiples of 32.

So I need to resize my data first (such as 3584 * 2016) and then change the config to [2016, 3584]?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 5, 2024

You do not need to resize the image in advance, you can set the size directly in the configuration file and the code will automatically resize the image to the set size.

@ZhenyuSun-Walker
Copy link

Wait, so for the first pipeline, doesn't it mean that: 16 source views -> (mlp) -> gaussian splatting -> 4 target views->

A lot of images as input will definitely slow down the FPS. Actually, our method mainly focuses on sparse (few-shot) view reconstruction.

Can I use this pipeline: input all images -> network -> 3d gaussian point cloud -> render all new views?

so what does the "put all images as input" mean?

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

You do not need to resize the image in advance, you can set the size directly in the configuration file and the code will automatically resize the image to the set size.

Thanks for your reply! If I have multi-view images and their depths (obtained by other MVS methods), can I use fusion.py to obtain initial point_cloud and then do optimization?

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

Wait, so for the first pipeline, doesn't it mean that: 16 source views -> (mlp) -> gaussian splatting -> 4 target views->

A lot of images as input will definitely slow down the FPS. Actually, our method mainly focuses on sparse (few-shot) view reconstruction.

Can I use this pipeline: input all images -> network -> 3d gaussian point cloud -> render all new views?

so what does the "put all images as input" mean?

I think the first pipeline is: input 2 or 3 images -> network -> 3d gaussian point cloud -> render a new view.

@ZhenyuSun-Walker
Copy link

Wait, so for the first pipeline, doesn't it mean that: 16 source views -> (mlp) -> gaussian splatting -> 4 target views->

A lot of images as input will definitely slow down the FPS. Actually, our method mainly focuses on sparse (few-shot) view reconstruction.

Can I use this pipeline: input all images -> network -> 3d gaussian point cloud -> render all new views?

so what does the "put all images as input" mean?

I think the first pipeline is: input 2 or 3 images -> network -> 3d gaussian point cloud -> render a new view.

As for the example dataset, there are 20 imaes, and 16 of them are used as source views, and the left of them are used as the ground truth of the target view

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 5, 2024

You do not need to resize the image in advance, you can set the size directly in the configuration file and the code will automatically resize the image to the set size.

Thanks for your reply! If I have multi-view images and their depths (obtained by other MVS methods), can I use fusion.py to obtain initial point_cloud and then do optimization?

Of course!

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

Wait, so for the first pipeline, doesn't it mean that: 16 source views -> (mlp) -> gaussian splatting -> 4 target views->

A lot of images as input will definitely slow down the FPS. Actually, our method mainly focuses on sparse (few-shot) view reconstruction.

Can I use this pipeline: input all images -> network -> 3d gaussian point cloud -> render all new views?

so what does the "put all images as input" mean?

I think the first pipeline is: input 2 or 3 images -> network -> 3d gaussian point cloud -> render a new view.

As for the example dataset, there are 20 imaes, and 16 of them are used as source views, and the left of them are used as the ground truth of the target view

Yes, but for each new view, only the nearest 2 or 3 views are used as inputs.

@gwxxx
Copy link
Author

gwxxx commented Sep 5, 2024

You do not need to resize the image in advance, you can set the size directly in the configuration file and the code will automatically resize the image to the set size.

Thanks for your reply! If I have multi-view images and their depths (obtained by other MVS methods), can I use fusion.py to obtain initial point_cloud and then do optimization?

Of course!

Thanks for your rely! But my depths are images, could you tell my how to modify fusion.py to obtain initial point_cloud?

@TQTQliu
Copy link
Owner

TQTQliu commented Sep 5, 2024

@gwxxx
If your depth map is single channel, it's okay to read it directly.
If your depth map is three-channel (for visualization purposes), you will need to save the original depth when using other MVS methods before reading it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants