Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How-to guide for Folds and ThreadsX #40

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/config.mk
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
include: config.mk

.PHONY: serve test clean instantiate

JULIA_CMD = julia --color=yes --startup-file=no
JULIA ?= julia
JULIA_CMD ?= $(JULIA) --color=yes --startup-file=no

serve: instantiate
bin/serve
Expand All @@ -13,3 +16,6 @@ clean:

instantiate:
$(JULIA_CMD) -e 'using Pkg; Pkg.activate("."); Pkg.instantiate()'

config.mk:
ln -s default-config.mk $@
1 change: 1 addition & 0 deletions default-config.mk
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
JULIA = julia1.6
199 changes: 199 additions & 0 deletions src/howto/high-level-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# How to find data-parallel algorithms

\note{Work in progress}

[Alternative intro (maybe this is better)]

When there is a library function that provides an algorithm for what you want to
compute, it is usually a good idea to use it. Although this is true for parallel
programming as well, it may not be obvious what functions are provided by the
libraries as it requires the knowledge of what can be parallelized. This how-to
guide tries to provide a simplified overview for finding data-parallel library
functions.

---

Julia provides `@spawn` as a basic building block for parallelism and
[FLoops.jl](https://github.com/JuliaFolds/FLoops.jl) provides generic parallel
`for` loop syntax. However, they are often too "low-level" as an API for
building parallel programs. To make writing parallel programs simpler,
[Folds.jl](https://github.com/JuliaFolds/Folds.jl) and
[ThreadsX.jl](https://github.com/tkf/ThreadsX.jl) provide basic "high-level"
algorithms similar to Julia's
[`Base`](https://docs.julialang.org/en/v1/base/base/) library as described
below. High-level APIs can be used for expressing *what* you want to compute
which aids not only understandability and maintainability of the program but
also the efficiency of the program because the library implementers can use
adequate strategy for *how* to compute the result.

If you are familiar with the functions listed in Folds.jl and ThreadsX.jl, no
further explanations are required for start using them. However, since not all
`Base` functions are parallelizable, it may be hard to remember which functions
are supported by these libraries. This how-to guide tries to provide a simple
"guide map" for navigating the functions provided by Folds.jl and ThreadsX.jl,
as summarized in the following picture:

![guide map](https://gist.githubusercontent.com/tkf/5012bd2e12f584ec8b6092476912c907/raw/folds.svg)

TODO: `cumprod!(ys)` -> `cumprod!(ys, xs)`

<!--

```
map(f, xs)
map!(f, ys, xs)
foreach(f, xs)

reduce(op, xs)
sum(xs)
prod(xs)
all(xs)
any(xs)
count(xs)
maximum(xs)
minimum(xs)
extrema(xs)
issorted(xs)

mapreduce(f, op, xs)
sum(f, xs)
prod(f, xs)
all(f, xs)
any(f, xs)
count(f, xs)
maximum(f, xs)
minimum(f, xs)
extrema(f, xs)
issorted(xs; by = f)

findmax(f, xs)
findmin(f, xs)
argmax(f, xs)
argmin(f, xs)
findall(f, xs)
findfirst(f, xs)
findlast(f, xs)

collect(xs)
unique(xs)
set(xs)
dict(xs)

accumulate(op, xs)
accumulate!(op, ys, xs)
cumsum(xs)
cumsum!(ys, xs)
cumprod(xs)
cumprod!(ys)
scan!(op, xs)
```
-->

The above picture is only very superficially accurate and you may need to
"unlearn" it to use these libraries in full extent. However, it is a good
approximation for start using

\tableofcontents

## Independent computation (`map`-family)

Given an array `xs` with elements $x_1, x_2, ..., x_n$...

| Function | Returns |
| :--- | :--- |
| `map(f, xs)` | $[f(x_1), f(x_2), ..., f(x_n)]$ |
| `map!(f, ys, xs)` | ditto, but stores the result in `ys` |
| `foreach(f, xs)` | runs $f(x_i)$ in parallel for side-effect |

TODO: explanations, examples, ...

## Simple reductions (`reduce`-family)

| Function | Returns |
| :--- | :--- |
| `reduce(⊗, xs)` | $x_1 \otimes x_2 \otimes ... \otimes x_n$ |
| `sum(xs)` | $x_1 + x_2 + ... + x_n$ |
| `prod(xs)` | $x_1 * x_2 * ... * x_n$ |
| `all(xs)` | $x_1 \;\&\; x_2 \;\&\; ... \;\&\; x_n$ |
| `any(xs)` | $x_1 \;|\; x_2 \;|\; ... \;|\; x_n$ |
| `count(xs)` | $x_1 + x_2 + ... + x_n$ |
| `maximum(xs)` | $\texttt{max}(x_1, x_2, ..., x_n)$ |
| `minimum(xs)` | $\texttt{min}(x_1, x_2, ..., x_n)$ |
| `extrema(xs)` | `(minimum(xs), maximum(xs))` |
| `issorted(xs)` | $(x_1 \le x_2) \;\&\; (x_2 \le x_3) \;\&\; ... \;\&\; (x_{n-1} \le x_n)$ |

## Element-wise computation fused with simple reductions (`mapreduce`-family)

| Function | Returns |
| :--- | :--- |
| `mapreduce(f, ⊗, xs)` | $f(x_1) \otimes f(x_2) \otimes ... \otimes f(x_n)$ |
| `sum(f, xs)` | $f(x_1) + f(x_2) + ... + f(x_n)$ |
| `prod(f, xs)` | $f(x_1) * f(x_2) * ... * f(x_n)$ |
| `all(f, xs)` | $f(x_1) \;\&\; f(x_2) \;\&\; ... \;\&\; f(x_n)$ |
| `any(f, xs)` | $f(x_1) \;|\; f(x_2) \;|\; ... \;|\; f(x_n)$ |
| `count(f, xs)` | $f(x_1) + f(x_2) + ... + f(x_n)$ |
| `maximum(f, xs)` | $\texttt{max}(f(x_1), f(x_2), ..., f(x_n))$ |
| `minimum(f, xs)` | $\texttt{min}(f(x_1), f(x_2), ..., f(x_n))$ |
| `extrema(f, xs)` | `(minimum(xs), maximum(xs))` |
| `issorted(xs; by = f)` | $(f(x_1) \le f(x_2)) \;\&\; (f(x_2) \le f(x_3)) \;\&\; ... \;\&\; (f(x_{n-1}) \le f(x_n))$ |

## Searching

| Function | Returns |
| :--- | :--- |
| `findmax(f, xs)` | `(x, index)` s.t. `xs[index] == x == maximum(f, xs)` |
| `findmin(f, xs)` | `(x, index)` s.t. `xs[index] == x == minimum(f, xs)` |
| `argmax(f, xs)` | `x` s.t. `f(x) == maximum(f, xs)` |
| `argmin(f, xs)` | `x` s.t. `f(x) == minimum(f, xs)` |
| `findall(f, xs)` | `indices` such that `f(xs[i])` holds iff `i` in `indices` |
| `findfirst(f, xs)` | first `i` s.t. `f(xs[i])` holds; `nothing` if not found |
| `findlast(f, xs)` | last `i` s.t. `f(xs[i])` holds; `nothing` if not found |

## Scan (`accumulate`-family)

| Function | Returns |
| :--- | :--- |
| `accumulate(⊗, xs)` | $[x_1, x_1 \otimes x_2, x_1 \otimes x_2 \otimes x_3, ...]$ |
| `accumulate!(⊗, ys, xs)` | ditto, but stores the result in `ys` |
| `scan!(⊗, xs)` | ditto, but stores the result in `xs` |
| `cumsum(xs)` | $[x_1, x_1 + x_2, x_1 + x_2 + x_3, ...]$ |
| `cumsum!(ys, xs)` | ditto, but stores the result in `ys` |
| `cumprod(xs)` | $[x_1, x_1 * x_2, x_1 * x_2 * x_3, ...]$ |
| `cumprod!(ys, xs)` | ditto, but stores the result in `ys` |

## Generating data structure (`collect`-family)

In above sections, we pretended that `xs` is an array. However, there various
*collection* of items in Julia. We can turn sufficiently well-behaving such
collections into an array in parallel using `Folds.collect`. It is also
possible to collect only unique elements using `Folds.unique`. The output data
structure does not have to be an array. We can use `Folds.set` to create a set
of elements. If there is no need to preserve the ordering of the input, it is
more efficient than `Folds.unique`. Given a collection of `Pair`s, we can use
`Folds.dict` to create a dictionary.

| Function | Returns |
| :--- | :--- |
| `collect(xs)` | collect elements of `xs` into an array |
| `unique(xs)` | collect unique elements of `xs` into an array, preserving the order |
| `set(xs)` | collect elements of `xs` into a set |
| `dict(xs)` | collect elements of `xs` into a dictionary |

## Flexible preprocessing

```
xs = (f(y) for x in collection if p(x) for y in g(x))
# ---- ------- -------------
# mapping filtering flattening
```

See also:
[A quick introduction to data parallelism in Julia](../../tutorials/quick-introduction)

## Folds.jl vs ThreadsX.jl: which one to use?

ThreadsX.jl provides a similar interface like Folds.jl but it is specific to
multi-core -based parallelism. Furthermore, it provides algorithms such as
`sort!` that cannot be easily expressed as a simple invocation of `reduce`.

Folds.jl focuses on algorithms expressible in terms of a generalized `reduce`.