Skip to content

Commit

Permalink
release-2.2 branch merge master (#1078)
Browse files Browse the repository at this point in the history
* 2.2.0 -> 2.3.0 (#947)

* Add tests for primary key (#948)

* add changelog (#955)

* add multi-column tests (#954)

* fix range partition throw UnsupportedSyntaxException error (#960)

* fix view parsing problem (#953)

* make tispark can read from a hash partition table (#966)

* increase ci worker number (#965)

* update readme for tispark-2.1.2 release (#968)

* update document for pyspark (#975)

* fix one jar bug (#972)

* adding common port number used by spark cluster (#973)

* fix cost model in table scan (#977)

* create an UninitializedType for TypeDecimal (#979)

* update sparkr doc (#976)

* use spark-2.4.3 to run ut (#978)

* use spark-2.4.3 to run ut

* fix ci

* a better design for get auto table id (#980)

* fix bug: ci SpecialTiDBTypeTestSuite failed with tidb-3.0.1 (#984)

* improve TiConfiguration getPdAddrsString function (#963)

* bump grpc to 1.17 (#982)

* Add multiple-column PK tests (#970)

* add retry for batchGet (#986)

* use tispark self-made m2 cahce file (#990)

* add spark sql document for batch write (#991)

* add auto mode for test.data.load (#994)

* fix typo (#996)

* fix index scan bug (#995)

* refine doc (#1003)

* add tidb-3.0 compatibility document (#998)

* add tidb-3.0 compatibility document

* address code review

* address code review

* add log4j config document (#1008)

* refactor batch write region pre-split (#999)

* add ci simple mode (#1012)

* clean up redundant code (#997)

* prohibit agg or groupby pushdown on double read (#1004)

* remove split region code (#1015)

* add supported scala version (#1013)

* Fix scala compiler version (#1010)

* fix reflection bug for hdp release (#1017) (#1018)

(cherry picked from commit 118b12e)

* check by grammarly (#1022)

* add benchmark result for batch write (#1025)

* release tispark 2.1.3 (#1026) (#1035)

(cherry picked from commit 107eb2b)

* support setting random seed in daily regression test (#1032)

* Remove create in tisession (#1021)

* set tikv region size from 96M to 1M (#1031)

* adding unique indices test for batch write (#1014)

* use one unique seed (#1043)

* remove unused code (#1030)

* adding batch write pk insertion test (#1044)

* fix table not found bug in TiSession because of synchronization (#1041)

* fix test failure (#1051)

* fix reflection bug: pass in different arguments for different version of same function (#1037) (#1052)

(cherry picked from commit a5462c2)

* Adding pk and unique index test for batch write (#1049)

* fix distinct without alias bug: disable pushdown aggregate with alias (#1054)

* improve the doc (#1053)

* Refactor RegionStoreClient logic (#989)

* using stream rather removeIf (#1057)

* Remove redundant pre-write/commit logic in LockResolverTest (#1062)

* adding recreate flag when create tisession (#1064)

* fix issue 1047 (#1066)

* cleanup code in TiBatchWrite (#1067)

* release tispark-2.1.4 (#1068) (#1069)

(cherry picked from commit fd8068a)

* update document for tispark-2.1.4 release (#1070)
  • Loading branch information
marsishandsome authored Aug 29, 2019
1 parent b4007e4 commit 5aeb1b8
Show file tree
Hide file tree
Showing 200 changed files with 6,258 additions and 2,457 deletions.
3 changes: 1 addition & 2 deletions .ci/build.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,12 @@ def call(ghprbActualCommit, ghprbPullId, ghprbPullTitle, ghprbPullLink, ghprbPul

catchError {
node ('build') {
def ws = pwd()
deleteDir()
container("java") {
stage('Checkout') {
dir("/home/jenkins/git/tispark") {
sh """
archive_url=http://172.16.30.25/download/builds/pingcap/tiflash/cache/tiflash-m2-cache_latest.tar.gz
archive_url=http://fileserver.pingcap.net/download/builds/pingcap/tispark/cache/tispark-m2-cache-latest.tar.gz
if [ ! "\$(ls -A /maven/.m2/repository)" ]; then curl -sL \$archive_url | tar -zx -C /maven || true; fi
"""
if (sh(returnStatus: true, script: '[ -d .git ] && [ -f Makefile ] && git rev-parse --git-dir > /dev/null 2>&1') != 0) {
Expand Down
82 changes: 51 additions & 31 deletions .ci/integration_test.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -6,45 +6,48 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
def TIDB_BRANCH = "master"
def TIKV_BRANCH = "master"
def PD_BRANCH = "master"
def MVN_PROFILE = ""
def PARALLEL_NUMBER = 9
def MVN_PROFILE = "-Pjenkins"
def TEST_MODE = "simple"
def PARALLEL_NUMBER = 18

// parse tidb branch
def m1 = ghprbCommentBody =~ /tidb\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m1) {
TIDB_BRANCH = "${m1[0][1]}"
}
m1 = null
println "TIDB_BRANCH=${TIDB_BRANCH}"

// parse pd branch
def m2 = ghprbCommentBody =~ /pd\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m2) {
PD_BRANCH = "${m2[0][1]}"
}
m2 = null
println "PD_BRANCH=${PD_BRANCH}"

// parse tikv branch
def m3 = ghprbCommentBody =~ /tikv\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m3) {
TIKV_BRANCH = "${m3[0][1]}"
}
m3 = null
println "TIKV_BRANCH=${TIKV_BRANCH}"

// parse mvn profile
def m4 = ghprbCommentBody =~ /profile\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m4) {
MVN_PROFILE = "-P${m4[0][1]}"
MVN_PROFILE = MVN_PROFILE + " -P${m4[0][1]}"
}

// parse test mode
def m5 = ghprbCommentBody =~ /mode\s*=\s*([^\s\\]+)(\s|\\|$)/
if (m5) {
TEST_MODE = "${m5[0][1]}"
}

def readfile = { filename ->
def file = readFile filename
return file.split("\n") as List
}

def remove_last_str = { str ->
return str.substring(0, str.length() - 1)
}

def get_mvn_str = { total_chunks ->
def mvnStr = " -DwildcardSuites="
for (int i = 0 ; i < total_chunks.size() - 1; i++) {
Expand All @@ -65,8 +68,7 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
println "${NODE_NAME}"
container("golang") {
deleteDir()
def ws = pwd()


// tidb
def tidb_sha1 = sh(returnStdout: true, script: "curl ${FILE_SERVER_URL}/download/refs/pingcap/tidb/${TIDB_BRANCH}/sha1").trim()
sh "curl ${FILE_SERVER_URL}/download/builds/pingcap/tidb/${tidb_sha1}/centos7/tidb-server.tar.gz | tar xz"
Expand All @@ -90,23 +92,38 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
sh """
cp -R /home/jenkins/git/tispark/. ./
git checkout -f ${ghprbActualCommit}
find core/src -name '*Suite*' > test
find core/src -name '*Suite*' | grep -v 'MultiColumnPKDataTypeSuite' > test
shuf test -o test2
mv test2 test
"""

if(TEST_MODE != "simple") {
sh """
find core/src -name '*MultiColumnPKDataTypeSuite*' >> test
"""
}

sh """
sed -i 's/core\\/src\\/test\\/scala\\///g' test
sed -i 's/\\//\\./g' test
sed -i 's/\\.scala//g' test
shuf test -o test2
mv test2 test
split test -n r/$PARALLEL_NUMBER test_unit_ -a 1 --numeric-suffixes=1
split test -n r/$PARALLEL_NUMBER test_unit_ -a 2 --numeric-suffixes=1
"""

for (int i = 1; i <= PARALLEL_NUMBER; i++) {
sh """cat test_unit_$i"""
if(i < 10) {
sh """cat test_unit_0$i"""
} else {
sh """cat test_unit_$i"""
}
}

sh """
cd tikv-client
./scripts/proto.sh
cd ..
cp .ci/log4j-ci.properties core/src/test/resources/log4j.properties
bash core/scripts/version.sh
bash core/scripts/fetch-test-data.sh
mv core/src/test core-test/src/
bash tikv-client/scripts/proto.sh
"""
}

Expand All @@ -120,31 +137,35 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb

def run_tispark_test = { chunk_suffix ->
dir("go/src/github.com/pingcap/tispark") {
run_chunks = readfile("test_unit_${chunk_suffix}")
if(chunk_suffix < 10) {
run_chunks = readfile("test_unit_0${chunk_suffix}")
} else {
run_chunks = readfile("test_unit_${chunk_suffix}")
}

print run_chunks
def mvnStr = get_mvn_str(run_chunks)
sh """
archive_url=http://172.16.30.25/download/builds/pingcap/tiflash/cache/tiflash-m2-cache_latest.tar.gz
archive_url=http://fileserver.pingcap.net/download/builds/pingcap/tispark/cache/tispark-m2-cache-latest.tar.gz
if [ ! "\$(ls -A /maven/.m2/repository)" ]; then curl -sL \$archive_url | tar -zx -C /maven || true; fi
"""
sh """
cp .ci/log4j-ci.properties core/src/test/resources/log4j.properties
export MAVEN_OPTS="-Xmx6G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=51M"
mvn compile ${MVN_PROFILE} -DskipCloneProtoFiles=true
mvn test ${MVN_PROFILE} -Dtest=moo ${mvnStr} -DskipCloneProtoFiles=true
mvn compile ${MVN_PROFILE}
mvn test ${MVN_PROFILE} -Dtest=moo ${mvnStr}
"""
}
}

def run_tikvclient_test = { chunk_suffix ->
dir("go/src/github.com/pingcap/tispark") {
sh """
archive_url=http://172.16.30.25/download/builds/pingcap/tiflash/cache/tiflash-m2-cache_latest.tar.gz
archive_url=http://fileserver.pingcap.net/download/builds/pingcap/tispark/cache/tispark-m2-cache-latest.tar.gz
if [ ! "\$(ls -A /maven/.m2/repository)" ]; then curl -sL \$archive_url | tar -zx -C /maven || true; fi
"""
sh """
export MAVEN_OPTS="-Xmx6G -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512M"
mvn test ${MVN_PROFILE} -am -pl tikv-client -DskipCloneProtoFiles=true
mvn test ${MVN_PROFILE} -am -pl tikv-client
"""
unstash "CODECOV_TOKEN"
sh 'curl -s https://codecov.io/bash | bash -s - -t @CODECOV_TOKEN'
Expand All @@ -155,7 +176,6 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
node("test_java") {
println "${NODE_NAME}"
container("java") {
def ws = pwd()
deleteDir()
unstash 'binaries'
unstash 'tispark'
Expand All @@ -167,17 +187,17 @@ def call(ghprbActualCommit, ghprbCommentBody, ghprbPullId, ghprbPullTitle, ghprb
killall -9 tikv-server || true
killall -9 pd-server || true
sleep 10
bin/pd-server --name=pd --data-dir=pd &>pd.log &
bin/pd-server --name=pd --data-dir=pd --config=go/src/github.com/pingcap/tispark/config/pd.toml &>pd.log &
sleep 10
bin/tikv-server --pd=127.0.0.1:2379 -s tikv --addr=0.0.0.0:20160 --advertise-addr=127.0.0.1:20160 &>tikv.log &
bin/tikv-server --pd=127.0.0.1:2379 -s tikv --addr=0.0.0.0:20160 --advertise-addr=127.0.0.1:20160 --config=go/src/github.com/pingcap/tispark/config/tikv.toml &>tikv.log &
sleep 10
ps aux | grep '-server' || true
curl -s 127.0.0.1:2379/pd/api/v1/status || true
bin/tidb-server --store=tikv --path="127.0.0.1:2379" --config=go/src/github.com/pingcap/tispark/config/tidb.toml &>tidb.log &
sleep 60
"""

timeout(60) {
timeout(120) {
run_test(chunk_suffix)
}
} catch (err) {
Expand Down
2 changes: 2 additions & 0 deletions .ci/log4j-ci.properties
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,5 @@ log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

# tispark
log4j.logger.com.pingcap=ERROR
log4j.logger.com.pingcap.tispark.utils.ReflectionUtil=DEBUG
log4j.logger.org.apache.spark.sql.test.SharedSQLContext=DEBUG
2 changes: 2 additions & 0 deletions .ci/tidb_config-for-daily-test.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# The seed used to generate test data (0 means random).
test.data.generate.seed=0
147 changes: 147 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# TiSpark Changelog
All notable changes to this project will be documented in this file.

## [TiSpark 2.1.4] 2019-08-27
### Fixes
- Fix distinct without alias bug: disable pushdown aggregate with alias [#1055](https://github.com/pingcap/tispark/pull/1055)
- Fix reflection bug: pass in different arguments for different version of same function [#1037](https://github.com/pingcap/tispark/pull/1037)

## [TiSpark 2.1.3] 2019-08-15
### Fixes
- Fix cost model in table scan [#1023](https://github.com/pingcap/tispark/pull/1023)
- Fix index scan bug [#1024](https://github.com/pingcap/tispark/pull/1024)
- Prohibit aggregate or group by pushdown on double read [#1027](https://github.com/pingcap/tispark/pull/1027)
- Fix reflection bug for HDP release [#1017](https://github.com/pingcap/tispark/pull/1017)
- Fix scala compiler version [#1019](https://github.com/pingcap/tispark/pull/1019)

## [TiSpark 2.2.0]
### New Features
* Natively support writing data to TiKV using Spark Data Source API
* Support select from partition table [#916](https://github.com/pingcap/tispark/pull/916)
* Release one tispark jar (both support Spark-2.3.x and Spark-2.4.x) instead of two [#933](https://github.com/pingcap/tispark/pull/933)
* Add spark version to tispark udf ti_version [#943](https://github.com/pingcap/tispark/pull/943)

## [TiSpark 2.1.2] 2019-07-29
### Fixes
* Fix improper response with region error [#922](https://github.com/pingcap/tispark/pull/922)
* Fix view parsing problem [#953](https://github.com/pingcap/tispark/pull/953)

## [TiSpark 1.2.1]
### Fixes
* Fix count error, if advanceNextResponse is empty, we should read next region (#899)
* Use fixed version of proto (#898)

## [TiSpark 2.1.1]
### Fixes
* Add TiDB/TiKV/PD version and Spark version supported for each latest major release (#804) (#887)
* Fix incorrect timestamp of tidbMapDatabase (#862) (#885)
* Fix column size estimation (#858) (#884)
* Fix count error, if advanceNextResponse is empty, we should read next region (#878) (#882)
* Use fixed version of proto instead of master branch (#843) (#850)

## [TiSpark 2.1]
### Features
* Support range partition pruning (Beta) (#599)
* Support show columns command (#614)

### Fixes
* Fix build key ranges with xor expression (#576)
* Fix cannot initialize pd if using ipv6 address (#587)
* Fix default value bug (#596)
* Fix possible IndexOutOfBoundException in KeyUtils (#597)
* Fix outputOffset is incorrect when building DAGRequest (#615)
* Fix incorrect implementation of Key.next() (#648)
* Fix partition parser can't parser numerical value 0 (#651)
* Fix prefix length may be larger than the value used. (#668)
* Fix retry logic when scan meet lock (#666)
* Fix inconsistent timestamp (#676)
* Fix tempView may be unresolved when applying timestamp to plan (#690)
* Fix concurrent DAGRequest issue (#714)
* Fix downgrade scan logic (#725)
* Fix integer type default value should be parsed to long (#741)
* Fix index scan on partition table (#735)
* Fix KeyNotInRegion may occur when retrieving rows by handle (#755)
* Fix encode value long max (#761)
* Fix MatchErrorException may occur when Unsigned BigInt contains in group by columns (#780)
* Fix IndexOutOfBoundException when trying to get pd member (#788)

## [TiSpark 2.0]
### Features
* Work with Spark 2.3
* Support use `$database` statement
* Support show databases statement
* Support show tables statement
* No need to use `TiContext.mapTiDBDatabase`, use `$database.$table` to identify a table instead
* Support data type SET and ENUM
* Support data type YEAR
* Support data type TIME
* Support isolation level settings
* Support describe table command
* Support cache tables and uncache tables
* Support read from a TiDB partition table
* Support use TiDB as metastore

### Fixes
* Fix JSON parsing (#491)
* Fix count on empty table (#498)
* Fix ScanIterator unable to read from adjacent empty regions (#519)
* Fix possible NullPointerException when setting show_row_id true (#522)

### Improved
* Make ti version usable without selecting database (#545)

## [TiSpark 1.2]
### Fixes
* Fixes compatibility with PDServer #480

## [TiSpark 1.1]
### Fixes multiple bugs:
* Fix daylight saving time (DST) (#347)
* Fix count(1) result is always 0 if subquery contains limit (#346)
* Fix incorrect totalRowCount calculation (#353)
* Fix request fail with Key not in region after retrying NotLeaderError (#354)
* Fix ScanIterator logic where index may be out of bound (#357)
* Fix tispark-sql dbName (#379)
* Fix StoreNotMatch (#396)
* Fix utf8 prefix index (#400)
* Fix decimal decoding (#401)
* Refactor not leader logic (#412)
* Fix global temp view not visible in thriftserver (#437)

### Adds:
* Allow TiSpark retrieve row id (#367)
* Decode json to string (#417)

### Improvements:
* Improve PD connection issue's error log (#388)
* Add DB prefix option for TiDB tables (#416)

## [TiSpark 1.0.1]
* Fix unsigned index
* Compatible with TiDB before and since 48a42f

## [TiSpark 1.0 GA]
### New Features
TiSpark provides distributed computing of TiDB data using Apache Spark.

* Provide a gRPC communication framework to read data from TiKV
* Provide encoding and decoding of TiKV component data and communication protocol
* Provide calculation pushdown, which includes:
- Aggregate pushdown
- Predicate pushdown
- TopN pushdown
- Limit pushdown
* Provide index related support
- Transform predicate into Region key range or secondary index
- Optimize Index Only queries
- Adaptive downgrade index scan to table scan per region
* Provide cost-based optimization
- Support statistics
- Select index
- Estimate broadcast table cost
* Provide support for multiple Spark interfaces
- Support Spark Shell
- Support ThriftServer/JDBC
- Support Spark-SQL interaction
- Support PySpark Shell
- Support SparkR
10 changes: 0 additions & 10 deletions R/.gitignore

This file was deleted.

11 changes: 0 additions & 11 deletions R/DESCRIPTION

This file was deleted.

1 change: 0 additions & 1 deletion R/NAMESPACE

This file was deleted.

Loading

0 comments on commit 5aeb1b8

Please sign in to comment.