NVIDIA Megatron-LM · Discussions · GitHub

Sort by: Latest activity

Discussions

You must be logged in to vote

[QUESTION] About Optimizer & Params Offload

shh2000 asked Jul 24, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] Calculations regarding calculate_per_token_loss parameter

clarence-lee-sheng asked Jul 19, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Has standalone_embedding_stage been supported yet in core?

JiwenJ asked Jun 26, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] add_position_embedding=False in checkpoint_args during Llama3 8B training

NEU-rzh asked Jul 17, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] Why not use tensor parallel APIs of pytorch
stale No activity in 60 days on issue or PR
GuWei007 asked May 16, 2024 in Q&A · Unanswered

2
You must be logged in to vote

Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline
stale No activity in 60 days on issue or PR
Hongjie1Chu asked May 17, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION] how to profile bubble time in pipeline parallelism?
stale No activity in 60 days on issue or PR
starstream asked May 15, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] How does tensor_parallel coop with q/k_layernorm
stale No activity in 60 days on issue or PR
cryoco asked May 10, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION]

woson asked Jul 8, 2024 in Q&A · Unanswered

0
You must be logged in to vote

function missing

ywb2018 asked Jul 8, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=2, worker_count=1, timeout=0:10:00)
stale No activity in 60 days on issue or PR
JanryPei asked Apr 16, 2024 in Q&A · Unanswered

3
You must be logged in to vote

[QUESTION] Why is expert parallelism not supported during fp16 training?
stale No activity in 60 days on issue or PR
yutian-mt asked May 7, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] Does Megatron-Core supports LLAMA models?
stale No activity in 60 days on issue or PR
noob-ctrl asked May 3, 2024 in Q&A · Unanswered

6
You must be logged in to vote

[QUESTION] How to pre-build the dataset's index ?
stale No activity in 60 days on issue or PR
etiennemlb asked Apr 24, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] bf16 Parameters and fp32 Gradients
stale No activity in 60 days on issue or PR
pluiez asked Apr 30, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[BUG]Question about helpers.cpp in version core_v0.7.0

longzhang418 asked Jun 28, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Getting tools/preprocess_data.py to work is painful

sambar1729 asked Jun 26, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Why megatron-core seems slower and use more gpu mem than legacy for gpt_pretrain?
stale No activity in 60 days on issue or PR
REIGN12 asked Apr 9, 2024 in Q&A · Unanswered

2
You must be logged in to vote

[QUESTION] Gloo connectFullMesh failed when the number of nodes setting "export GLOO_SOCKET_IFNAME=bond4" exceeds 60

Genlovy-Hoo asked Jun 19, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] How to time the code

Weifan1226 asked Jun 16, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] Using segformer segmentation models

cporrasn asked Jun 14, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION] why the _p2p_ops functions has the condition branches for get_pipeline_model_parallel_rank()

lichenlu asked Jun 14, 2024 in Q&A · Unanswered

0
You must be logged in to vote

[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
stale No activity in 60 days on issue or PR
starkhu asked Apr 9, 2024 in Q&A · Unanswered

1
You must be logged in to vote

[QUESTION]Where does the attention_mask come from when the gpt_model is not the first or last pipeline stage?

janelu9 asked Jun 8, 2024 in Q&A · Unanswered

0
You must be logged in to vote

Incorrect shuffling of documents across epochs in GPTDataset
stale No activity in 60 days on issue or PR
argitrage asked Feb 20, 2024 in Q&A · Unanswered

2