Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions when finetune SAM2 using MOSE #368

Closed
lyu-yx opened this issue Oct 10, 2024 · 1 comment
Closed

Questions when finetune SAM2 using MOSE #368

lyu-yx opened this issue Oct 10, 2024 · 1 comment

Comments

@lyu-yx
Copy link

lyu-yx commented Oct 10, 2024

Hi there, thanks for your great work! For now, I am trying to finetune SAM2 on my own dataset but I have a try on finetune it on MOSE at first. After config the environment , download the MOSE dataset, and change the path in sam2.1_hiera_b+_MOSE_finetune.yaml. I am using python 3.12.4+torch2.3.1+cuda12.2. I encounter some errors like:

    [rank0]: Traceback (most recent call last):
    [rank0]:   File "/scratch/hp2173/sam2/training/train.py", line 270, in <module>
    [rank0]:     main(args)
    [rank0]:   File "/scratch/hp2173/sam2/training/train.py", line 240, in main
    [rank0]:     single_node_runner(cfg, main_port)
    [rank0]:   File "/scratch/hp2173/sam2/training/train.py", line 53, in single_node_runner
    [rank0]:     single_proc_run(local_rank=0, main_port=main_port, cfg=cfg, world_size=num_proc)
    [rank0]:   File "/scratch/hp2173/sam2/training/train.py", line 41, in single_proc_run
    [rank0]:     trainer.run()
    [rank0]:   File "/scratch/hp2173/sam2/training/trainer.py", line 515, in run
    [rank0]:     self.run_train()
    [rank0]:   File "/scratch/hp2173/sam2/training/trainer.py", line 532, in run_train
    [rank0]:     outs = self.train_epoch(dataloader)
    [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [rank0]:   File "/scratch/hp2173/sam2/training/trainer.py", line 740, in train_epoch
    [rank0]:     for data_iter, batch in enumerate(train_loader):
    [rank0]:   File "/scratch/hp2173/sam2/training/dataset/sam2_datasets.py", line 64, in __next__
    [rank0]:     raise e
    [rank0]:   File "/scratch/hp2173/sam2/training/dataset/sam2_datasets.py", line 56, in __next__
    [rank0]:     item = next(self._iter_dls[dataset_idx])
    [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    [rank0]:     data = self._next_data()
    [rank0]:            ^^^^^^^^^^^^^^^^^
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
    [rank0]:     return self._process_data(data)
    [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
    [rank0]:     data.reraise()
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/_utils.py", line 706, in reraise
    [rank0]:     raise exception
    [rank0]: UnboundLocalError: Caught UnboundLocalError in DataLoader worker process 0.
    [rank0]: Original Traceback (most recent call last):
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    [rank0]:     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
    [rank0]:            ^^^^^^^^^^^^^^^^^^^^
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    [rank0]:     data = [self.dataset[idx] for idx in possibly_batched_index]
    [rank0]:             ~~~~~~~~~~~~^^^^^
    [rank0]:   File "/scratch/hp2173/sam2/training/dataset/utils.py", line 104, in __getitem__
    [rank0]:     return self.dataset[self.epoch_ids[idx]]
    [rank0]:            ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
    [rank0]:   File "/ext3/miniconda3/lib/python3.12/site-packages/torch/utils/data/dataset.py", line 350, in __getitem__
    [rank0]:     return self.datasets[dataset_idx][sample_idx]
    [rank0]:            ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
    [rank0]:   File "/scratch/hp2173/sam2/training/dataset/vos_dataset.py", line 132, in __getitem__
    [rank0]:     return self._get_datapoint(idx)
    [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^
    [rank0]:   File "/scratch/hp2173/sam2/training/dataset/vos_dataset.py", line 74, in _get_datapoint
    [rank0]:     datapoint = self.construct(video, sampled_frms_and_objs, segment_loader)
    [rank0]:                                ^^^^^
    [rank0]: UnboundLocalError: cannot access local variable 'video' where it is not associated with a value

also, there are some warnings like

    WARNING:root:Loading failed (id=790); Retry 0 with exception: invalid literal for int() with base 10: '._00078'
    WARNING:root:Loading failed (id=1097); Retry 1 with exception: invalid literal for int() with base 10: '._00002'
    WARNING:root:Loading failed (id=712); Retry 2 with exception: invalid literal for int() with base 10: '._00031'
    WARNING:root:Loading failed (id=43); Retry 3 with exception: invalid literal for int() with base 10: '._00050'
    WARNING:root:Loading failed (id=941); Retry 4 with exception: invalid literal for int() with base 10: '._00031'
    WARNING:root:Loading failed (id=522); Retry 5 with exception: invalid literal for int() with base 10: '._00002'
    WARNING:root:Loading failed (id=1237); Retry 6 with exception: invalid literal for int() with base 10: '._00078'
    WARNING:root:Loading failed (id=220); Retry 7 with exception: invalid literal for int() with base 10: '._00050'
    WARNING:root:Loading failed (id=754); Retry 8 with exception: invalid literal for int() with base 10: '._00002'
    WARNING:root:Loading failed (id=266); Retry 9 with exception: invalid literal for int() with base 10: '._00031'
    WARNING:root:Loading failed (id=134); Retry 10 with exception: invalid literal for int() with base 10: '._00031'

I think this error occured during the data loading process but I am not sure if it is related to the datasets I download? Although I fetched MOSE from its official repo. Could you help me out?

@jayisaking
Copy link

Solved in PR #370

@lyu-yx lyu-yx closed this as completed Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants