- Create a virtualenv;
- Install the requirements:
pip3 install -r requirements.txt
; - Get the
final1.json.gz
,final2.json.gz
files and save them in thedata/
; - If you want to import data from the RDB dump, get the
routerank.rdb
file and save it in thedata/
;
Start redis in a docker container:
docker run --rm --name cache -p 6379:6379 redis:latest
Load the data into redis:
Note: for Windows unzip the files first because the decompression does not work.
python3 loader.py data/final1.json.gz data/final2.json.gz
You can check how many keys have been loaded:
$ docker run -it \
--rm \
--link cache:cache \
-v $PWD/data:/data \
redis redis-cli -h cache INFO Keyspace
# Keyspace
db0:keys=3050074,expires=0,avg_ttl=0
Export the data in RDB format:
docker run -it \
--rm \
--link cache:cache \
-v $PWD/data:/data \
redis redis-cli -h cache --rdb /data/routerank.rdb
sending REPLCONF capa eof
sending REPLCONF rdb-only 1
SYNC sent to master, writing 247701866 bytes to '/data/routerank.rdb'
Transfer finished with success.
Note that:
docker run
is run without--rm
this time, because we need to stop and restart the container- when copying the data the destinastion file must be named
dump.rdb
Load the .rdb
file:
docker run --name cache -p 6379:6379 redis:latest
docker stop cache
docker cp data/routerank.rdb cache:/data/dump.rdb
Then you can restart the container:
$ docker start -a cache
1:C 22 May 2021 21:39:43.988 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 22 May 2021 21:39:43.988 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 22 May 2021 21:39:43.988 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
1:M 22 May 2021 21:39:43.989 * monotonic clock: POSIX clock_gettime
1:M 22 May 2021 21:39:43.989 * Running mode=standalone, port=6379.
1:M 22 May 2021 21:39:43.989 # Server initialized
1:M 22 May 2021 21:39:43.990 * Loading RDB produced by version 6.2.1
1:M 22 May 2021 21:39:43.990 * RDB age 140 seconds
1:M 22 May 2021 21:39:43.990 * RDB memory usage when created 481.09 Mb
1:M 22 May 2021 21:39:46.317 * DB loaded from disk: 2.328 seconds
1:M 22 May 2021 21:39:46.317 * Ready to accept connections
You can check how many keys are loaded from the dump:
$ docker run -it \
--rm \
--link cache:cache \
-v $PWD/data:/data \
redis redis-cli -h cache INFO Keyspace
# Keyspace
db0:keys=3050074,expires=0,avg_ttl=0
this should match the number of keys as above.
When you are done you can stop and remove the container:
docker stop cache
docker rm cache
Note: This will require a recent version of docker-compose
.
$ docker-compose --version
docker-compose version 1.29.2, build 5becea4c
This docker compose will bring up two containers, one with Redis and another with the loader script. The latter will insert the data in Redis.
$ docker-compose --profile loading up
Creating network "offer-cache_cache-network" with the default driver
Creating offer-cache_cache_1 ... done
Creating offer-cache_loader_1 ... done
Attaching to offer-cache_cache_1, offer-cache_loader_1
...
To export the data in RDB format issue the following command. You may need
to change the network name offer-cache_cache-network
, use the same name
as it appears in the output of docker-compose up
:
$ docker run -it \
--rm \
--network cache-network \
--link cache:cache \
-v "$PWD"/data:/data \
redis redis-cli -h cache --rdb /data/routerank.rdb
sending REPLCONF capa eof
sending REPLCONF rdb-only 1
SYNC sent to master, writing 249875542 bytes to '/data/routerank.rdb'
Transfer finished with success.
If you bring up a docker-compose again, it will conserve its data:
$ docker-compose up
Starting offer-cache_cache_1 ... done
Attaching to offer-cache_cache_1
cache_1 | 1:C 22 May 2021 22:06:42.051 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
cache_1 | 1:C 22 May 2021 22:06:42.051 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
cache_1 | 1:C 22 May 2021 22:06:42.051 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
cache_1 | 1:M 22 May 2021 22:06:42.051 * monotonic clock: POSIX clock_gettime
cache_1 | 1:M 22 May 2021 22:06:42.052 * Running mode=standalone, port=6379.
cache_1 | 1:M 22 May 2021 22:06:42.052 # Server initialized
cache_1 | 1:M 22 May 2021 22:06:42.052 * Loading RDB produced by version 6.2.1
cache_1 | 1:M 22 May 2021 22:06:42.052 * RDB age 1759 seconds
cache_1 | 1:M 22 May 2021 22:06:42.052 * RDB memory usage when created 481.09 Mb
cache_1 | 1:M 22 May 2021 22:06:44.382 * DB loaded from disk: 2.331 seconds
cache_1 | 1:M 22 May 2021 22:06:44.383 * Ready to accept connections
and with Ctrl+C you can stop it again:
[Ctrl+C is pressed]
Gracefully stopping... (press Ctrl+C again to force)
Stopping offer-cache_cache_1 ... done
If you remove the docker-compose containers with docker-compose down
:
$ docker-compose down
Removing offer-cache_cache_1 ... done
Removing network cache-network
to re-import the data again first launch docker-compose:
$ docker-compose up
Creating network "cache-network" with the default driver
Creating offer-cache_cache_1 ... done
Attaching to offer-cache_cache_1
cache_1 | 1:C 22 May 2021 22:13:11.274 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
cache_1 | 1:C 22 May 2021 22:13:11.274 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
cache_1 | 1:C 22 May 2021 22:13:11.274 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
cache_1 | 1:M 22 May 2021 22:13:11.275 * monotonic clock: POSIX clock_gettime
cache_1 | 1:M 22 May 2021 22:13:11.277 * Running mode=standalone, port=6379.
cache_1 | 1:M 22 May 2021 22:13:11.277 # Server initialized
cache_1 | 1:M 22 May 2021 22:13:11.277 * Ready to accept connections
stop the container:
[Ctrl+C pressed]
Gracefully stopping... (press Ctrl+C again to force)
Stopping offer-cache_cache_1 ... done
Import the .rdb
file:
docker cp data/routerank.rdb offer-cache_cache_1:/data/dump.rdb
Restart the docker-compose file:
$ docker-compose up
Starting offer-cache_cache_1 ... done
Attaching to offer-cache_cache_1
cache_1 | 1:C 22 May 2021 22:14:54.166 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
cache_1 | 1:C 22 May 2021 22:14:54.166 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=1, just started
cache_1 | 1:C 22 May 2021 22:14:54.166 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
cache_1 | 1:M 22 May 2021 22:14:54.167 * monotonic clock: POSIX clock_gettime
cache_1 | 1:M 22 May 2021 22:14:54.168 * Running mode=standalone, port=6379.
cache_1 | 1:M 22 May 2021 22:14:54.168 # Server initialized
cache_1 | 1:M 22 May 2021 22:14:54.169 * Loading RDB produced by version 6.2.1
cache_1 | 1:M 22 May 2021 22:14:54.169 * RDB age 2251 seconds
cache_1 | 1:M 22 May 2021 22:14:54.169 * RDB memory usage when created 481.09 Mb
cache_1 | 1:M 22 May 2021 22:14:57.163 * DB loaded from disk: 2.994 seconds
cache_1 | 1:M 22 May 2021 22:14:57.163 * Ready to accept connections
and you can check as before:
$ docker run -it \
--rm \
--network cache-network \
--link offer-cache_cache_1:cache \
-v $PWD/data:/data \
redis redis-cli -h cache INFO Keyspace
# Keyspace
db0:keys=3050074,expires=0,avg_ttl=0
Retrieve some request ids:
$ python3 query.py '#31:9819-#25:22859' '#30:7097-#25:16673'
* request id: #31:9819-#25:22859
- offer id: 6990564355760000
- offer id: 6990564287782000
- offer id: 6990537605906000
---
* request id: #30:7097-#25:16673
- offer id: 7000245235962000
- offer id: 7000245253032000
- offer id: 7000245277584000
---
$ python3 query.py --help
usage: query.py [-h] [-H REDIS_HOST] [--port REDIS_PORT]
<request_id> [<request_id> ...]
positional arguments:
<request_id> Request ids to query.
optional arguments:
-h, --help show this help message and exit
-H REDIS_HOST, --host REDIS_HOST
Redis hostname [default: localhost].
--port REDIS_PORT Redis port [default: 6379].