Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: [Errno 2] No such file or directory: './checkpoint/test/ckpt_eai_dct_n30_out30_dctn60_best.pth.tar' #2

Open
johndpope opened this issue Sep 9, 2024 · 3 comments

Comments

@johndpope
Copy link

running test.py errors

@johndpope
Copy link
Author

johndpope commented Sep 9, 2024

does it need it training?
there's an error in training code Falsecd
https://github.com/Dingpx/EAI/blob/main/train.py#L168

UPDATE -
how to define these?
rank = int(os.environ["RANK"])
~~~~~~~~~~^^^^^^^^
File "", line 679, in getitem
KeyError: 'RANK'

UPDATE - found these


export RANK=0
export PORT=12345
export LOCAL_RANK=0
export MASTER_ADDR=127.0.0.1    

UPDATE

i replace the the distributed training with accelerate
https://github.com/johndpope/EAI/blob/main/train2.py

it's training...
Screenshot from 2024-09-10 05-37-28

how long with how much gpu cluster to train ?

usage: train.py [-h] [--device DEVICE] [--grab_data_dict GRAB_DATA_DICT] [--exp EXP] [--ckpt CKPT]
                [--model_type MODEL_TYPE] [--max_norm] [--linear_size LINEAR_SIZE] [--num_stage NUM_STAGE]
                [--num_body NUM_BODY] [--num_lh NUM_LH] [--num_rh NUM_RH] [--lr LR] [--lr_decay LR_DECAY]
                [--lr_gamma LR_GAMMA] [--input_n INPUT_N] [--output_n OUTPUT_N] [--all_n ALL_N] [--actions ACTIONS]
                [--epochs EPOCHS] [--dropout DROPOUT] [--train_batch TRAIN_BATCH] [--val_batch VAL_BATCH]
                [--test_batch TEST_BATCH] [--job JOB] [--seed SEED] [--local_rank LOCAL_RANK] [--W_pg W_PG]
                [--W_p W_P] [--is_load] [--is_debug] [--is_exp] [--sample_rate SAMPLE_RATE] [--is_norm_dct]
                [--is_norm] [--is_using_saved_file] [--is_hand_norm] [--is_hand_norm_split] [--is_part]
                [--part_type PART_TYPE] [--is_boneloss] [--is_weighted_jointloss] [--is_using_noTpose2]
                [--is_using_raw] [--J J]
train.py: error: unrecognized arguments: --local-rank=0

@Dingpx
Copy link
Owner

Dingpx commented Sep 17, 2024

Sorry, it's been a long time since I last ran this code. It seems that I used 8 A100/V100 to train this project. Regarding the checkpoint, please wait for me a moment as I am busy with my current project, and I will release this checkpoint in a few weeks.

@johndpope
Copy link
Author

my accelerate code got me by - thanks

Im interested to take 2 body poses -
https://github.com/johndpope/EAI/blob/main/pose_vis.py

and using the coco-wholebody -
https://github.com/johndpope/EAI/blob/main/test.png
interpolate between them (using correct human like joint movement)

my reading is that this codebase could be suitable - did you do any work here?
I found more repos doing gesture / fusion - but i just want a sequence of poses - to throw into stable diffusion....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants