LB2の再実装 + SIMD拡張。
Ubuntu 18.04で動作確認済みの手順を以下に示す。
- Java, SBTのインストール
$ sudo apt-get install openjdk-8-jdk
$ echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
$ curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
$ sudo apt-get update
$ sudo apt-get install sbt
- 生成コードのビルド環境の準備(gcc, make, libomp, etc.)
$ sudo apt-get install build-essential
$ sudo apt-get install libomp-dev
sbtのrunコマンドの後に、動作モードとクエリを指定して実行する。コード生成からビルド、実行まで一通り行われる。
$ git clone https://github.com/jnmt/lb2c.git
$ cd lb2c
$ sbt
> run compiler "Print(Scan(\"a.tbl\"))"
- 動作モード
モード | 説明 |
---|---|
compiler | 通常モード。対応済みオペレータは下記参照のこと。 |
simd | SIMDモード。オペレータは通常モードと同様。AVX-512用のフラグをつけてビルドし、インテルのエミュレータ(SDE)経由で実行 |
parallel | 並列処理コード生成モード。オペレータはSIMD対応を除き通常モードと同様。不安定。 |
interpreter | インタープリタで実行。一部オペレータのみ。 |
- サポート済みオペレータ
オペレータ | 説明 |
---|---|
結果を標準出力へ出力。 | |
Time | 実行時間を計測して出力。 |
Scan | パイプ"|"区切りのテキストファイルをスキャン。 |
Filter | 指定した述語に基づいタプルを選択。 |
FilterV | 指定した述語に基づいタプルを選択。SIMD利用。 |
Project | 指定した属性を射影。 |
Calculate | 指定した計算式を評価。 |
CalculateV | 指定した計算式を評価。SIMD利用。 |
NestedLoopJoin | ネストループ結合方式で関係結合。 |
HashJoin | ハッシュ結合方式で関係結合。 |
Aggregate | 集約。count/min/max/sum/avgをサポート。 |
Sort | クイックソートでソート。昇順のみサポート。 |
CaseWhen | CASE WHEN文。 |
Preload | 事前にテキストファイルをスキャンしてメモリ上にロード。 |
ScanTable | Preloadしたテーブルをスキャン |
- Scan
Print( Scan(\"/Users/jun/src/tpch-dbgen/nation.tbl\", int n_nationkey, string n_name, int n_regionkey, string n_comment) )
- Filter
Print( Filter( Scan(\"/Users/jun/src/tpch-dbgen/nation.tbl\", int n_nationkey, string n_name, int n_regionkey, string n_comment), n_regionkey=3) )
- HashJoin
Print(
HashJoin (
Filter( Scan(\"/Users/jun/src/tpch-dbgen/nation.tbl\", int n_nationkey, string n_name, int n_regionkey, string n_comment), n_regionkey=3),
Scan(\"/Users/jun/src/tpch-dbgen/customer.tbl\", int c_custkey, string c_name, string c_address, int c_nationkey, string c_phone, double c_acctbal, string c_mktsegment, string c_comment),
n_nationkey, c_nationkey
)
)
- Project
Print(
Project(
NestedLoopJoin (
Filter( Scan(\"/Users/jun/src/tpch-dbgen/nation.tbl\", int n_nationkey, string n_name, int n_regionkey, string n_comment), n_regionkey=3),
Scan(\"/Users/jun/src/tpch-dbgen/customer.tbl\", int c_custkey, string c_name, string c_address, int c_nationkey, string c_phone, double c_acctbal, string c_mktsegment, string c_comment),
n_nationkey, c_nationkey
),
c_custkey, c_name, n_name )
)
- Aggregate
Print(
Aggregate(
HashJoin(
Scan(\"/Users/jun/src/tpch-dbgen/partsupp.tbl\", int ps_partkey, int ps_suppkey, int ps_availqty, double ps_supplycost, string ps_comment),
Scan(\"/Users/jun/src/tpch-dbgen/lineitem.tbl\", int l_orderkey, int l_partkey, int l_suppkey, int l_linenumber, double l_quantity, double l_extendedprice, double l_discount, double l_tax, string l_returnflag, string l_linestatus, string l_shipdate, string l_commitdate, string l_receiptdate, string l_shipinstruct, string l_shipmode, string l_comment),
ps_suppkey, l_suppkey
),
keys(l_orderkey), count(*) )
)
- Calculate, Aggregate & Project
Print(
Project(
Aggregate(
Calculate(
Scan(\"/Users/jun/src/tpch-dbgen/lineitem.tbl\", int l_orderkey, int l_partkey, int l_suppkey, int l_linenumber, double l_quantity, double l_extendedprice, double l_discount, double l_tax, string l_returnflag, string l_linestatus, string l_shipdate, string l_commitdate, string l_receiptdate, string l_shipinstruct, string l_shipmode, string l_comment),
l_extendedprice*(1-l_discount)*(1+l_tax) as charge
),
keys(l_returnflag, l_linestatus), sum(charge) ),
sum_charge
)
)
- Sort
Print(
Sort(
Scan(\"/Users/jun/src/tpch-dbgen/part.tbl\", int p_partkey, string p_name, string p_mfgr, string p_brand, string p_type, int p_size, string p_container, double p_retailprice, string p_comment),
p_size, p_retailprice
)
)
- CaseWhen
Print(
CaseWhen(
Scan(\"/Users/jun/src/tpch-dbgen/nation.tbl\", int n_nationkey, string n_name, int n_regionkey, string n_comment),
n_regionkey = 3 then 1 else 0 as x
)
)
- Q1
Print(
Sort(
Aggregate(
Calculate(
Filter(
Scan(\"/Users/jun/src/tpch-dbgen/lineitem.tbl\", int l_orderkey, int l_partkey, int l_suppkey, int l_linenumber, double l_quantity, double l_extendedprice, double l_discount, double l_tax, string l_returnflag, string l_linestatus, date l_shipdate, date l_commitdate, date l_receiptdate, string l_shipinstruct, string l_shipmode, string l_comment),
l_shipdate <= \"1998-09-02\"
),
l_extendedprice*(1-l_discount) as disc_price,
l_extendedprice*(1-l_discount)*(1+l_tax) as charge
),
keys(l_returnflag, l_linestatus), sum(l_quantity), sum(l_extendedprice), sum(disc_price), sum(charge), avg(l_quantity), avg(l_extendedprice), avg(l_discount), count(*)
),
l_returnflag, l_linestatus
)
)
- Q1 (Preload & Exec)
PreloadExec(
Scan(\"/Users/jun/src/tpch-dbgen/lineitem.tbl\", int l_orderkey, int l_partkey, int l_suppkey, int l_linenumber, double l_quantity, double l_extendedprice, double l_discount, double l_tax, string l_returnflag, string l_linestatus, date l_shipdate, date l_commitdate, date l_receiptdate, string l_shipinstruct, string l_shipmode, string l_comment) as lineitem,
Time(Print(
Sort(
Aggregate(
Calculate(
Filter(ScanTable(lineitem), l_shipdate <= \"1998-09-02\"),
l_extendedprice*(1-l_discount) as disc_price,
l_extendedprice*(1-l_discount)*(1+l_tax) as charge
),
keys(l_returnflag, l_linestatus), sum(l_quantity), sum(l_extendedprice), sum(disc_price), sum(charge), avg(l_quantity), avg(l_extendedprice), avg(l_discount), count(*)
),
l_returnflag, l_linestatus
)
))
)
- Q6
Print(
Aggregate(
Calculate(
Filter(
Scan(\"/Users/jun/src/tpch-dbgen/lineitem.tbl\", int l_orderkey, int l_partkey, int l_suppkey, int l_linenumber, double l_quantity, double l_extendedprice, double l_discount, double l_tax, string l_returnflag, string l_linestatus, date l_shipdate, date l_commitdate, date l_receiptdate, string l_shipinstruct, string l_shipmode, string l_comment),
l_shipdate >= \"1994-01-01\"
and l_shipdate < \"1995-01-01\"
and l_discount >= 0.05
and l_discount <= 0.07
and l_quantity < 24
),
l_extendedprice*l_discount as revenue
),
keys(), sum(revenue)
)
)
- Q14
Calculate(
Aggregate(
CaseWhen(
Calculate(
HashJoin(
Scan(\"/Users/jun/src/tpch-dbgen/part.tbl\", int p_partkey, string p_name, string p_mfgr, string p_brand, string p_type, int p_size, string p_container, double p_retailprice, string p_comment),
Filter(
Scan(\"/Users/jun/src/tpch-dbgen/lineitem.tbl\", int l_orderkey, int l_partkey, int l_suppkey, int l_linenumber, double l_quantity, double l_extendedprice, double l_discount, double l_tax, string l_returnflag, string l_linestatus, date l_shipdate, date l_commitdate, date l_receiptdate, string l_shipinstruct, string l_shipmode, string l_comment),
l_shipdate >= \"1995-09-01\" and l_shipdate < \"1995-10-01\"),
p_partkey, l_partkey
),
l_extendedprice*(1-l_discount) as x
),
p_type like \"PROMO%\" then x else 0 as cw
),
keys(), sum(cw), sum(x)
),
100*sum_cw/sum_x as promo_revenue
)
- Ruby Y. Tahboub, Grégory M. Essertel, Tiark Rompf, "How to Architect a Query Compiler, Revisited", SIGMOD '18