forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 346
Pull requests: microsoft/Megatron-DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bug]Fix init issue for layer_norm in sequence_parallel for non-CUDA device.
#450
opened Sep 29, 2024 by
ys950902
Loading…
fix --use-cpu-initialization error when expert is not tensor-parallel
#413
opened Jul 3, 2024 by
taozhiwei
Loading…
Fix ConstantGradScaler and loss-scale argument not match
#376
opened Apr 12, 2024 by
BeingGod
Loading…
Simplify SP - Opportunity to improve SP scalability
#301
opened Nov 28, 2023 by
RezaYazdaniAminabadi
Loading…
ProTip!
Updated in the last three days: updated:>2024-11-23.