Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epipolar cost volume and depth prediction #42

Open
chenjiajie9811 opened this issue Jun 25, 2024 · 2 comments
Open

Epipolar cost volume and depth prediction #42

chenjiajie9811 opened this issue Jun 25, 2024 · 2 comments

Comments

@chenjiajie9811
Copy link

Hi there,

thank you for your great work! I have a question regarding the concept of using epipolar cost volume for the depth prediction.

It is reasonable to use this method when we consider the region where two images have overlap, we can search along the epipolar line and find the position with the highest similarity as the depth prediction.
What if we are considering the pixels on the region without overlap, we won't find any truth correspondences on the epipolar line, and the cost volume we constructed might have equally low similarities, how is it possible for the network to learn the depth in this case?

I am a little bit confused and looking forward to your reply.

Regards

@fangchuan
Copy link

Same concerns.
Furthermore, I also feel confused about how to get the real scale depth prediction from the approach mentioned in paper.
By looking through the codes, it seems not significantly different with other learning-based stereo approach. But when I export the ply of gaussians, the point cloud scale is almost the same as the real world one, which means the predicted depth is correct in scale. I am wondering why the costvolume depth encoder network can figure out the scale without considering the stereo baseline?

@donydchen
Copy link
Owner

Hi, @chenjiajie9811, thanks for your interest in our work.

Our project indeed assumes that there are significant overlaps between the input views, and this is actually how the data is structured during testing (see the index-generating code at here).

For parts that have no overlap, MVSplat relies on the following UNet to help propagate the matching information (see Cost volume refinement in Sec. 3.1 of the paper). However, this solution is intuitive and may only somewhat ease the issue. It is a promising future direction to consider improving accuracy in those non-overlap regions.


Hi, @fangchuan. Your findings are pretty interesting. May I know how do you confirm that "the point cloud scale is almost the same as the real world one"? I remember there is no ground-truth 3D data for the RE10K or ACID dataset. In fact, MVSplat is not intended to predict real-scale depth (aka metric depth), which should be quite difficult to achieve without additional regularization. Instead, MVSplat merely aims to predict a relative depth bounded by the predefined near and far planes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants