Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lesson Title #2

Open
tkphd opened this issue Mar 3, 2022 · 11 comments
Open

Lesson Title #2

tkphd opened this issue Mar 3, 2022 · 11 comments

Comments

@tkphd
Copy link
Member

tkphd commented Mar 3, 2022

hpc-novice is overly generic, and instills confusion: which is more fundamental, hpc-intro or hpc-novice?

Please rename to reflect the scope of the planned lesson. I suggest hpc-python-programming for this episode, which would provide a good thematic fit and obvious connection with hpc-python-data as we rename the other lesson to better match its scope.

@bkmgit
Copy link
Contributor

bkmgit commented Mar 3, 2022

Would like to have other languages, C and Fortran with OpenMP, MPI and Sycl, so having Python in the title is inappropriate. Maybe "HPC-Pi" or "HPC-Monte-Carlo", but other suggestions are welcome.

@tkphd
Copy link
Member Author

tkphd commented Mar 3, 2022

Covering multiple languages is good and appropriate, but I'm not sure doing more than one in a single lesson is wise. I believe people who come for Python will die when you introduce Fortran, and in general, learners will tune out when you're not teaching the language they're here for. Since hpc-intro focuses on Python, and this repository is where we intend to migrate a lot of material we cut from there, I suggest keeping Python as the primary language for this lesson. We can develop hpc-fortran-programming (and -profiling, etc.) as well, but separately.

@bkmgit
Copy link
Contributor

bkmgit commented Mar 3, 2022

The lesson sequence would introduce compiled languages and discuss speedups in execution speed. Even if someone just uses Python, many languages are compiled and it is good to expose this and likely speedup that would give a lower execution time. As an example have introduced SYCL following Shell lesson and Git lesson, https://github.com/VCCA2021HPC/simple-md

@bkmgit
Copy link
Contributor

bkmgit commented Mar 3, 2022

There is a heavy focus on Python in https://github.com/carpentries-incubator/lesson-gpu-programming/ and will likely be more Python in a lesson on big data. Probably one could combine https://carpentries-incubator.github.io/lesson-parallel-python/ with some of the material in http://www.hpc-carpentry.org/hpc-python/

@tkphd
Copy link
Member Author

tkphd commented Mar 3, 2022

Sure, compiled code offers gains. The question really is what Skills your learners will take away from the lesson: if they're not already familiar with C or Fortran, showing them that those languages are faster is nice, but intangible. The goal is to have them type something to develop familiarity and practice. With high-level languages like Python, you can lean on The Carpentries' Python lessons for fundamentals, introduce the idea of writing Python in a text editor instead of a Notebook, and then take a serial Python code and add MPI calls with relatively few keystrokes. The learners will get the experience of typing in the function calls and the explanation of what MPI is and what those calls do using the scatter/gather illustrations, and will quickly see (from the scaling study) that those few lines do something to help performance.

I take the existence of other Python-based lessons in The Incubator as a reason to keep pushing Python, since there's an established ecosystem of lessons we can depend on, refer to, and insert into workshops when needed.

There is no official Carpentries lesson on C, Fortran, or Sycl, so the lesson would have to have familiarity with at least one of those languages as a prerequisite, then teach a crash course for learners who know one or the other, then finally get into typing out the programs, which can be very time-consuming. Porting md.c to md_sycl.cpp, you've added 50 lines of code, switched languages, and introduced namespaces (which are not native to C). I can imagine extension of md.c for MPI would take about as much typing. This can be managed in the allotted time (3 to 5 hours), but it's a lot of typing up front before seeing any benefit.

In Python, @mikerenfro showed that going from serial to MPI takes just 9 lines of Python.

At the moment, I would like to keep our development effort more focused on developing a set of 3 or 4 lessons with each building on the skills learned from the previous. I'm not saying that we can't teach compiled languages: I certainly believe that we should! But for a lesson like this one, which takes the Python-based hpc-intro as its basis, going into compiled language looks like a massive expansion of scope. I propose that developing that material is a valuable exercise to undertake after we have a whole Python-based curriculum, which supports the stated goal of the community to petition for Carpentries membership.

@bkmgit
Copy link
Contributor

bkmgit commented Mar 3, 2022

It would be good to incorporate the existing Python lessons and encourage contributors to these to continue contributing. As such I see no reason in developing another Python lesson. The main contribution from HPC-Intro is parallel pi walk through example, profiling and performance comparison. This enables participants to make choices on where to run and what speed ups to expect. People with a more advanced programming background would probably benefit from a more involved example or examination of MPI in greater depth. For people who need to run existing community science codes, this lesson would give some idea about what they should choose to run on.

@tkphd
Copy link
Member Author

tkphd commented Mar 3, 2022

OK. Let's rewind: the emphasis of this lesson is not "novice," or "pi," or "Monte Carlo," or Python/C/Fortran, really. What's a better name to reflect the intended scope?

  • Is this lesson for novice, intermediate, or experienced HPC practitioners?
  • Is the intended audience wide (general principles) or narrow (specific techniques)?
  • How much time do you plan this lesson to take?
  • What is the core skillset you want this lesson to encompass:
    • Optimizing code or compiler calls?
    • Profiling personal or community code?
    • Scaling studies with "real" exemplars?
    • Parallelization architectures (shared vs. distributed vs. platform portable)?
    • Accelerator architectures?
    • Advanced scheduler features (arrays, ...)?

@bkmgit
Copy link
Contributor

bkmgit commented Mar 3, 2022

This lesson should be accessible to someone who has some scripting background as gained in HPC Shell and HPC Intro.

It should be half a day, about 4 hours.

It would:

  • Introduce one parallel programming model Monte Carlo/Map Reduce, which is one of several described at https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf - all the others require greater knowledge than is given by the prerequisites
  • Explain differences between compiled and interpreted codes
  • Start with a basic code and slowly make changes to examine impact on time to solution
  • Examine the effects of these changes by profiling the example codes and doing scaling studies where appropriate
  • Different programming models and architectures would be demonstrated to aid in understanding performance tradeoffs and allow learners to decide if they want to learn more about them. As an example, a single GPU core is slow, but since there are many of them, one can improve time to solution for some applications
  • Introduce performance, portability and productivity tradeoffs
  • This would be helpful for lessons that examine specific community codes that are developed in future, but the present lesson would not examine a specific community code.

@tkphd
Copy link
Member Author

tkphd commented Mar 3, 2022

Sounds good. It looks to me like this is the core:

Introduce performance, portability and productivity tradeoffs

The target audience would be somebody with a program that runs reasonably well on their laptop, and they want to scale it up for the cluster. Perhaps the new lesson could be hpc-scaling-performance?

@reid-a
Copy link
Member

reid-a commented Mar 3, 2022

My recollection of the discussion from the co-working meeting on March 3 was that the successor lessons to HPC Intro would have a data focus (the implication being that the existing parallel-novice, with it's Python/Dask focus, would be a good starting point for that), and that the other repo (this one, by implication) would be more about programming.

I think "hpc-novice" is too broad of a name for this repo, though I confess I have not fully digested all of the discussion above. Maybe "parallel-programming" for this one? Or "Intro to parallel programming", since parallel programming is a big topic?

@bkmgit
Copy link
Contributor

bkmgit commented Mar 4, 2022

Maybe HPC-Pi-Performance-Productivity-Portability or HPC-Performance-Productivity-Portability-Intro or some shorter version of these? The module does introduce parallel programming, but much more is required for a parallel programming course. In parallel programming courses for people with programming backgrounds, an in depth project enforces the self-learning skills needed for future work. The target audience is a domain scientist who needs to assess how well the code they are using is running and if they are being efficient enough with the computational resources they want to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants