Users define a scrape_config.json
file in the config
folder to instruct subscrape
what accounts or types of info they are interested in. To get started, just rename sample_scrape_config.json
to scrape_config.json
since it already shows you what syntax to use.
{
"_version": 1,
"kusama": {
"_db_connection_string": "sqlite:///data/cache/default.db",
"_auto_hydrate": false,
"extrinsics": {
"_filter": [{"block_timestamp": [{"<":1644796800}]}],
"system": [
"remark"
],
"utility":{
"_skip": true,
"batch":{},
"batch_all":{}
}
},
"extrinsics-list":[
"14238250-2"
],
"events": {
"crowdloan": [
"created"
]
},
"events-list":[
"14238250-39"
],
},
"moonriver": {
"transactions": {
"0xaa30ef758139ae4a7f798112902bf6d65612045f": [
"0xe8e33700"
]
},
"account_transactions": {
"accounts": [
"0x1a93b23281cc1cde4c4741353f3064709a16197d"
],
"_filter": [{"timeStamp": [{">=": 1638169446}, {"<=": 1638248544}]}]
}
}
}
The SQLAlchemy connection string to the database. The default is sqlite:///data/cache/default.db
.
The Subscan API has two different calls per entity type from which it delivers
extrinsics and events data. e.g. the events
call has more parameters, but the
event
call gives more data. Thus, when calling events
, it makes sense to
automatically hydrate the events with the missing info from the event
call.
Hydration takes considerably more time (up to 100x).
The default is true
. Set to false
to disable.
If set to true
, the scraper will stop scraping when it encounters an extrinsic or event that it has already scraped. This is useful for incremental scraping.
The default is true
. Set to false
to disable.
Scrapes extrinsics by using their module
and name
. module
can be None
to scrape all extrinsics. name
can also be None
to scrape all extrinsics of a module.
Scrapes extrinsics by using a list of extrinsic indexes.
Scrapes events by using their module
and name
. module
can be None
to scrape all events. name
can also be None
to scrape all events of a module.
Scrapes events by using a list of event indexes.
subscrape
currently supports the following top-level "operations" when analysing Moonriver/beam activities:
transactions
operation- For each contract and contract method,
subscrape
will build a list of all accounts that have called that contract method. This is useful for collecting a list of wallets who have added liquidity to a DEX. - Inside
transactions
should be a list of contract addresses. For each contract address, one or more hex-formatted contract method ids should be specified. These can be found by scrolling to the input of a transaction on Moonscan and copying the method id. Example: https://moonriver.moonscan.io/tx/0x35c7d7bdd33c77c756e7a9b41b587a6b6193b5a94956d2c8e4ea77af1df2b9c3
- For each contract and contract method,
account_transactions
operation- For each account listed, extract a list of all transactions by that wallet. The script will determine what type of activity has occurred and extract additional information if possible. For instance, if the transaction was a contract interaction with a DEX swapping tokens,
subscrape
can determine the names of the tokens and what exact quantities were swapped. Not yet supported are basic ERC-20 token transfers, adding DEX liquidty, and staking. But these can easily be added in the future without requiring any changes to your config file. Once these additional analysis features are incorporated, the data for each transaction will be updated to include a richer set of information about the transaction. Eventually, that can be pumped out to a spreadsheet to create a clean list of taxable events.
- For each account listed, extract a list of all transactions by that wallet. The script will determine what type of activity has occurred and extract additional information if possible. For instance, if the transaction was a contract interaction with a DEX swapping tokens,
When scraping either Substrate chains or EVM chains, the following additional modifiers can be applied at any level to help curate what data is extracted.
Will skip the current scope of the config.
Use the _filter
keyword to specify conditions that transactions must meet in order to not be ignored. For instance:
"_filter": [{"block_timestamp": [{">=": 1638169446}, {"<=": 1638248544}]}]
Multiple types of filters can be applied at the same level, and the following operators are supported: ==
, <,
<=
, >
, >=
. If the filter is improperly configured, no transactions will be filtered out.
This allows you to set params which are sent to the API. For example, you can set
the address
param to a specific address to start scraping all extrinsics or
events for that address. This is a very powerful feature that allows you to tap
into the full power of the Subscan API.
This will be useful in the future if breaking changes are needed. But for now, just leave it as 1
.