Sync torch titan #7

philippguevorguian · 2024-08-24T12:08:58Z

No description provided.

ghstack-source-id: 28a28926bec3c1a6671a18b403deab0fc096d218 Pull Request resolved: pytorch#538

ghstack-source-id: ceb4fa54121be241633daf06a0ca2eb407667274 Pull Request resolved: pytorch#535

closes: pytorch#548 > Nvidia Ada Lovelace GPUs (e.g., RTX 4090, L20, L40) with SM89 version are also support FP8 MMA, and hence, it is recommended to relax the CUDA architecture limitations to enable FP8 training on a broader range of devices. > > and the [CUDA 12.0 announcement](https://developer.nvidia.com/blog/cuda-toolkit-12-0-released-for-general-availability/) says that it supports Lovelace architecture: > '*CUDA 12.0 exposes programmable functionality for many features of the NVIDIA Hopper and NVIDIA Ada Lovelace architectures: ...32x Ultra xMMA (including FP8 and FP16)*' > > - https://developer.nvidia.com/blog/cuda-toolkit-12-0-released-for-general-availability/ > - https://nvidia.github.io/TensorRT-LLM/reference/support-matrix.html > - https://github.com/NVIDIA/cutlass/blob/c4e3e122e266644c61b4af33d0cc09f4c391a64b/include/cutlass/arch/mma_sm89.h#L57 > > ![image](https://github.com/user-attachments/assets/3c11736c-2e84-4bd6-a49c-5af8b0e3e6ac) After relaxing the CUDA architecture limitations for FP8, my environment with **4 x L40 GPUs (SM89)** can still successfully train llama under float8 precision. ![image](https://github.com/user-attachments/assets/1337e041-0d0d-49b5-8c11-00e67f4df41f) --------- Co-authored-by: Andrew Gu <[email protected]>

ghstack-source-id: ab6a7cec6ba4f4690f5834d22bc16d8d9f2bdba8 Pull Request resolved: pytorch#555

In this PR, we mostly measured the performance and loss curves for 405B model with some optimizations techniques we recently developed. We also want to log the actual peak TFLOPs used for MFU calculation for cross-validation. Also we should get device information from system rather from device name because it does not contain "NVL" or "SXM". <img width="496" alt="image" src="https://github.com/user-attachments/assets/ba822de5-cf23-4ecd-b29c-70f9aac38290">

As title. We have updated the peak FLOPs for H100 so we need to use the correct number here

The lspci command is part of the `pciutils` package, which provides tools for listing and querying PCI devices. But somehow `pciutils` is not installed in CI machines. This PR is to first unblock CI failure and then we can see if we want to make `pciutils` a requirement for Titan.

Somehow, when rebasing, the legacy float8 enabling flag stays in the 405B toml. Let's remove it. And this does not affect the perf number we obtained because the old flag is just a no-op after rebase.

ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4 Pull Request resolved: pytorch#525

Latest torch titan changes

awgu and others added 12 commits August 20, 2024 11:06

Updated FSDP -> FSDPModule in doc

b76d755

ghstack-source-id: 28a28926bec3c1a6671a18b403deab0fc096d218 Pull Request resolved: pytorch#538

remove compiled_rmsnorm

40210ea

ghstack-source-id: ceb4fa54121be241633daf06a0ca2eb407667274 Pull Request resolved: pytorch#535

remove PP tracer

b029b73

ghstack-source-id: ab6a7cec6ba4f4690f5834d22bc16d8d9f2bdba8 Pull Request resolved: pytorch#555

Fix the performance number for 405B (pytorch#556)

3faaa1a

As title. We have updated the peak FLOPs for H100 so we need to use the correct number here

Address comment from PR(pytorch#557) (pytorch#558)

21e8b55

[ez] Remove legacy float8 enabling flag in 405B toml (pytorch#559)

3bdccb9

Somehow, when rebasing, the legacy float8 enabling flag stays in the 405B toml. Let's remove it. And this does not affect the perf number we obtained because the old flag is just a no-op after rebase.

add contributing guidelines

8c497b7

ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4 Pull Request resolved: pytorch#525

Merge pull request #6 from pytorch/main

2e55278

Latest torch titan changes

Merge branch 'main' into sync_torch_titan

527229d

philippguevorguian marked this pull request as draft September 2, 2024 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync torch titan #7

Sync torch titan #7

philippguevorguian commented Aug 24, 2024

Sync torch titan #7

Are you sure you want to change the base?

Sync torch titan #7

Conversation

philippguevorguian commented Aug 24, 2024