Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation swp #1

Open
wants to merge 10 commits into
base: swp-base-may-28
Choose a base branch
from
Open

Computation swp #1

wants to merge 10 commits into from

Commits on Jun 14, 2024

  1. [Tutorial] fix autotune for flash attention (triton-lang#4046)

    Tuning configs should depend on HEAD_DIM. Also fix the input layout of v  for fp8.
    manman-ren committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    0620175 View commit details
    Browse the repository at this point in the history
  2. do not segfault for reduce on forOp arguments (a PR in draft mode)

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    c6f8a81 View commit details
    Browse the repository at this point in the history
  3. fp8 FA: head dim 128, seq len 16384, causal is false

    Summary: fixed tuning config, fp8 only
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    bf89787 View commit details
    Browse the repository at this point in the history
  4. fp8 FA: support TMA with fixed block size

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    bee081c View commit details
    Browse the repository at this point in the history
  5. add two env vars for comp SWP

    Summary:
    SWP_FIRST_DOT + no PEEL_EPILOGUE
    SWP_FIRST_DOT + PEEL_EPILOGUE
    SWP_FIRST_DOT + PEEL_EPILOGUE + MERGE_FIRST_PEEL
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    3790057 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. N_CTX: tl.constexpr

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    93f1f61 View commit details
    Browse the repository at this point in the history
  2. check correctness

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    0bc5294 View commit details
    Browse the repository at this point in the history
  3. fix FIRST_DOT by adding dependency from tmaCopy to wait

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    14feee3 View commit details
    Browse the repository at this point in the history
  4. [fp8 FA] base will be TMA, the other one is without TMA

    Summary: We compare the implementations
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    c9fd91b View commit details
    Browse the repository at this point in the history

Commits on Jul 11, 2024

  1. LOAD_DIFFERENT_STAGE, remove comparison of results

    Summary:
    
    Test Plan:
    
    Reviewers:
    
    Subscribers:
    
    Tasks:
    
    Tags:
    manman-ren committed Jul 11, 2024
    Configuration menu
    Copy the full SHA
    01419b1 View commit details
    Browse the repository at this point in the history