- Title
- Organization
- Location
- Country
- Region
- URLs:
- Base URL
- Action API URL
- Index API URL
- SPARQL Endpoint URL (machine endpoint)
- SPARQL UI URL (human-usable UI)
- Special:Version URL
Query:
query MyQuery {
wikibase(wikibaseId: 10) {
id
title
organization
location {
country
region
}
urls {
baseUrl
actionApi
indexApi
sparqlEndpointUrl
sparqlUrl
specialVersionUrl
}
}
}
Results:
{
"data": {
"wikibase": {
"id": "10",
"title": "ELTEdata",
"organization": "Digital Humanities Department of ELTE BTK (Eötvös Loránd University Faculty of Humanities)",
"location": {
"country": "Hungary",
"region": "Europe"
},
"urls": {
"baseUrl": "https://eltedata.elte-dh.hu",
"actionApi": "https://eltedata.elte-dh.hu/w/api.php",
"indexApi": "https://eltedata.elte-dh.hu/w/index.php",
"sparqlEndpointUrl": "https://query.elte-dh.hu/proxy/wdqs/bigdata/namespace/wdq/sparql",
"sparqlUrl": "https://query.elte-dh.hu/",
"specialVersionUrl": "https://eltedata.elte-dh.hu/wiki/Special:Version"
}
}
}
}
All observations return observationDate
, the date the observation was attempted, and returnedData
, a boolean to signify if the observation attempt was successful. All data fields are null if returnedData
is false.
For each type of observation, the most recent successful observation -- maximum observationDate
where returnedData == True
-- is returned as mostRecent
, and all observations, successful or not, are returned in a collection labelled allObservations
. mostRecent
will be null
if there are no successful observations.
The relationship between mostRecent
and allObservations
is demonstrated below in Connectivity Observations, but omitted elsewhere for brevity.
Please see connectivity_notes for further details.
We want to measure the connectivity of the network of Wikidata items in the Wikibase. Using SPARQL, we query the Wikibase for direct links between Wikidata items. We then calculate the following:
- Returned Links: total number of links returned in our query. NOT UNIQUE.
- Total Connections: total number of connections between items, direct or indirect*
- Average Connection Distance: Say a returned link
a -> b
has length1
. An indirect* connection using two such returned links,a -> b -> c
then has length2
. This figure represents the average length of all connections, direct or indirect. - Connectivity - In theory, each item could link to every other item in the network. So we take the actual number of connections and divide by the number of possible connections:
k / (n * (n - 1))
, wherek
is the number of connections (direct or indirect) andn
is the total number of items. - Relationship Item Counts: If we retrieve
a -> b, a -> c
, we say that the itema
links to2
objects, and itemsb
andc
link to0
objects. We then aggregate further and say that1
item has2
relationships and2
items have0
relationships. - Relationship Object Counts: If we retrieve
a -> b, a -> c
, we say that the objecta
is linked to by0
items, the objectb
is linked to by1
item, and the objectc
is linked to by1
item. We then aggregate further and say that1
object has0
relationships and2
objects have1
relationship.
* The SPARQL query returns directional links a -> b
, so we say there's a direct connection between a
and b
. If b -> c
is also returned, then we say a
is indirectly connected to c
: a -> b -> c
. Note that when we say directional, we mean that we do not assume b -> a
if a -> b
; we would need a separate b -> a
connection.
Query:
query MyQuery {
wikibase(wikibaseId: 43) {
id
connectivityObservations {
mostRecent {
...WikibaseConnectivityObservationStrawberryModelFragment
}
allObservations {
...WikibaseConnectivityObservationStrawberryModelFragment
}
}
}
}
fragment WikibaseConnectivityObservationStrawberryModelFragment on WikibaseConnectivityObservationStrawberryModel {
id
observationDate
returnedData
returnedLinks
totalConnections
averageConnectedDistance
connectivity
relationshipItemCounts {
relationshipCount
itemCount
}
relationshipObjectCounts {
relationshipCount
objectCount
}
}
Result:
{
"data": {
"wikibase": {
"id": "43",
"connectivityObservations": {
"mostRecent": {
"id": "12",
"observationDate": "2024-06-24T09:01:31",
"returnedData": true,
"returnedLinks": 210,
"totalConnections": 205,
"averageConnectedDistance": 1.725531914893617,
"connectivity": 0.06429548563611491,
"relationshipItemCounts": [
{
"relationshipCount": 0,
"itemCount": 2
},
{
"relationshipCount": 1,
"itemCount": 38
},
...
],
"relationshipObjectCounts": [
{
"relationshipCount": 0,
"objectCount": 36
},
{
"relationshipCount": 1,
"objectCount": 28
},
...
]
},
"allObservations": [
{
"id": "1",
"observationDate": "2024-06-20T12:13:08",
"returnedData": true,
"returnedLinks": 210,
...
},
{
"id": "6",
"observationDate": "2024-06-20T16:48:27",
"returnedData": true,
"returnedLinks": 210,
...
},
...
]
}
}
}
}
Data abbreviated for brevity.
Using the Action API, we query for the first log and the last 30 days'.
- First Log:
- Date
- Last Log:
- Date
- User Type: Bot, Missing, None, User
- Last Month:
- All Users: Count distinct users
- Human Users: Count distinct (probably) human users
- Log Count
Query:
query MyQuery {
wikibase(wikibaseId: 10) {
logObservations {
mostRecent {
id
observationDate
returnedData
firstLog {
date
}
lastLog {
date
userType
}
lastMonth {
allUsers
humanUsers
logCount
}
}
}
}
}
Result:
{
"data": {
"wikibase": {
"logObservations": {
"mostRecent": {
"id": "39",
"observationDate": "2024-07-03T21:18:08",
"returnedData": true,
"firstLog": {
"date": "2021-03-19T09:20:21"
},
"lastLog": {
"date": "2024-07-03T14:45:01",
"userType": "BOT"
},
"lastMonth": {
"allUsers": 2,
"humanUsers": 1,
"logCount": 387
}
}
}
}
}
}
Using SPARQL, we query for all properties in the Wikibase, and the number of times they are used in the data. We return that list.
- Property URL: the format the properties are returned in, as many are not specific to this particular wikibase
- Usage Count: Number of times the property is used
Query:
query MyQuery {
wikibase(wikibaseId: 43) {
id
propertyPopularityObservations {
mostRecent {
id
observationDate
returnedData
propertyPopularityCounts {
id
propertyUrl
usageCount
}
}
}
}
}
Result:
{
"data": {
"wikibase": {
"id": "43",
"propertyPopularityObservations": {
"mostRecent": {
"id": "1",
"observationDate": "2024-06-24T13:05:05",
"returnedData": true,
"propertyPopularityCounts": [
{
"id": "110",
"propertyUrl": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"usageCount": 936
},
{
"id": "1",
"propertyUrl": "http://wikiba.se/ontology#rank",
"usageCount": 323
},
...
{
"id": "112",
"propertyUrl": "http://schema.org/dateModified",
"usageCount": 135
},
...
{
"id": "31",
"propertyUrl": "http://modelling.dissco.tech/prop/P15",
"usageCount": 1
},
...
]
}
}
}
}
}
Data abbreviated for brevity.
Using SPARQL, we query for the total number of items, lexemes, and properties in the data.
- Total Items
- Total Lexemes
- Total Properties
- Total Triples
Query:
query MyQuery {
wikibase(wikibaseId: 43) {
id
quantityObservations {
mostRecent {
id
observationDate
returnedData
totalItems
totalLexemes
totalProperties
}
}
}
}
Result:
{
"data": {
"wikibase": {
"id": "43",
"quantityObservations": {
"mostRecent": {
"id": "76",
"observationDate": "2024-06-24T08:58:24",
"returnedData": true,
"totalItems": 86,
"totalLexemes": 0,
"totalProperties": 48
}
}
}
}
}
Data abbreviated for brevity.
This data is parsed from the Wikibase's Special:Statistics page. The following fields are fetched:
- Page Statistics:
- Content Pages
- Pages
- Uploaded Files
- Edit Statistics:
- Page Edits
- User Statistics:
- Registered Users
- Active Users
- Administrators
- Other Statistics:
- Words in Content Pages
Some Wikibases remain on MediaWiki versions that do not include all of these statistics; we return null for each statistics that cannot be located.
Query:
query MyQuery {
wikibase(wikibaseId: 1) {
id
statisticsObservations {
mostRecent {
id
observationDate
returnedData
edits {
editsPerPageAvg
totalEdits
}
files {
totalFiles
}
pages {
contentPageWordCountAvg
contentPageWordCountTotal
contentPages
totalPages
}
users {
activeUsers
totalAdmin
totalUsers
}
}
}
}
}
editsPerPageAvg
and contentPageWordCountAvg
are calculated at query time.
Result:
{
"data": {
"wikibase": {
"id": "1",
"statisticsObservations": {
"mostRecent": {
"id": "96",
"observationDate": "2024-08-21T19:29:12",
"returnedData": true,
"edits": {
"editsPerPageAvg": 2.8565157050360703,
"totalEdits": 36150983
},
"files": {
"totalFiles": 30
},
"pages": {
"contentPageWordCountAvg": 0.032581015189210576,
"contentPageWordCountTotal": 27750,
"contentPages": 851723,
"totalPages": 12655622
},
"users": {
"activeUsers": 5,
"totalAdmin": 17,
"totalUsers": 465
}
}
}
}
}
}
This data is parsed from the Wikibase's Special:Version page. For each installed software (Mediawiki, Elasticsearch, etc), skin, library, and extension, the following fields are fetched:
- Software Name
- Version: If any identifiable version string exists in the table row; may be semver, date, docker-tag-like string, hash, a combination, or nothing
- Version Date: If any identifiable, parsable date exists in the table row
- Version Hash: If any identifiable commit hash exists in the table row
Query:
query MyQuery {
wikibase(wikibaseId: 43) {
id
softwareVersionObservations {
mostRecent {
id
observationDate
returnedData
installedExtensions {
...WikibaseSoftwareVersionStrawberryModelFragment
}
installedLibraries {
...WikibaseSoftwareVersionStrawberryModelFragment
}
installedSkins {
...WikibaseSoftwareVersionStrawberryModelFragment
}
installedSoftware {
...WikibaseSoftwareVersionStrawberryModelFragment
}
}
}
}
}
fragment WikibaseSoftwareVersionStrawberryModelFragment on WikibaseSoftwareVersionStrawberryModel {
id
softwareName
version
versionDate
versionHash
}
Result:
{
"data": {
"wikibase": {
"id": "43",
"softwareVersionObservations": {
"mostRecent": {
"id": "93",
"observationDate": "2024-06-26T18:41:28",
"returnedData": true,
"installedExtensions": [
{
"id": "13933",
"softwareName": "Babel",
"version": "1.12.0",
"versionDate": null,
"versionHash": null
},
{
"id": "13948",
"softwareName": "CLDR",
"version": "4.10.0",
"versionDate": null,
"versionHash": null
},
...
],
"installedLibraries": [
{
"id": "13955",
"softwareName": "christian-riesen/base32",
"version": "1.4.0",
"versionDate": null,
"versionHash": null
},
{
"id": "13956",
"softwareName": "composer/installers",
"version": "1.12.0",
"versionDate": null,
"versionHash": null
},
...
],
"installedSkins": [
{
"id": "13930",
"softwareName": "Vector",
"version": null,
"versionDate": null,
"versionHash": null
}
],
"installedSoftware": [
{
"id": "13928",
"softwareName": "Elasticsearch",
"version": "6.8.23",
"versionDate": null,
"versionHash": null
},
{
"id": "13927",
"softwareName": "ICU",
"version": "67.1",
"versionDate": null,
"versionHash": null
},
...
]
}
}
}
}
}
Data abbreviated for brevity.
This data is fetched from the Action API. We return the total number of users registered in the Wikibase, and for each group, we save the following data:
- Group Name
- Wikibase Default: Whether or not the group is part of the default list from a stock Wikibase install
- Group Implicit: Whether the group is implicitly applied to users
- User Count
We do not save the names of any users in the database.
Query:
query MyQuery {
wikibase(wikibaseId: 43) {
id
userObservations {
mostRecent {
id
observationDate
returnedData
totalUsers
userGroups {
id
group {
id
groupName
wikibaseDefault
}
groupImplicit
userCount
}
}
}
}
}
Result:
{
"data": {
"wikibase": {
"id": "43",
"userObservations": {
"mostRecent": {
"id": "43",
"observationDate": "2024-06-17T13:41:14.013073",
"returnedData": true,
"totalUsers": 22,
"userGroups": [
{
"id": "312",
"group": {
"id": "1",
"groupName": "*",
"wikibaseDefault": true
},
"groupImplicit": true,
"userCount": 22
},
...
{
"id": "316",
"group": {
"id": "5",
"groupName": "bureaucrat",
"wikibaseDefault": true
},
"groupImplicit": false,
"userCount": 2
},
...
]
}
}
}
}
}
Data abbreviated for brevity.
A paginated list of the Wikibase instances.
Arguments:
- Page Number: 1-indexed page number
- Page Size: number of Wikibases per page
Results:
- Meta:
- Page Number: same as the input
- Page Size: same as the input
- Total Count: total number of Wikibases
- Total Pages: total number of pages, with the given total and page size
- Data: list of Wikibases, ordered by id ascending. Every field noted above in Individual Wikibase Instances is accessible here.
Query:
query MyQuery {
wikibaseList(pageNumber: 2, pageSize: 10) {
meta {
pageNumber
pageSize
totalCount
totalPages
}
data {
id
title
urls {
baseUrl
}
quantityObservations {
mostRecent {
totalItems
totalLexemes
totalProperties
}
}
}
}
}
Result:
{
"data": {
"wikibaseList": {
"meta": {
"pageNumber": 2,
"pageSize": 10,
"totalCount": 43,
"totalPages": 5
},
"data": [
{
"id": "11",
"title": "Kunstmuseum API",
"urls": {
"baseUrl": "https://api.kunstmuseum.nl"
},
"quantityObservations": {
"mostRecent": {
"totalItems": 37267,
"totalLexemes": 0,
"totalProperties": 113
}
}
},
...
{
"id": "20",
"title": "Safer Nicotine Wiki",
"urls": {
"baseUrl": "https://safernicotine.wiki/mediawiki"
},
"quantityObservations": {
"mostRecent": null
}
}
]
}
}
}
Data abbreviated for brevity.
Data aggregated from the mostRecent
record for each Wikibase.
This aggregates the number of wikibases created by year. The creation date is derived from the first log date of each wikibase,.
Query:
query MyQuery {
aggregateCreated {
wikibaseCount
year
}
}
Result:
{
"data": {
"aggregateCreated": [
{
"wikibaseCount": 1,
"year": 2005
},
{
"wikibaseCount": 1,
"year": 2006
},
{
"wikibaseCount": 1,
"year": 2011
},
...
]
}
}
Data abbreviated for brevity.
All four work exactly the same way, so they are outlined together here.
Aggregated from the Software Version Observations above. The data is paginated as above in Wikibase List; the Page Number and Page Size arguments are the same, and the queries return Meta and Data as above.
- Software Name
- Wikibase Count: Number of Wikibases in which software is installed
- Versions: List of versions, ordered by wikibase count descending
- Version: Version string (if existant)
- Version Date: Version date (if existant)
- Version Hash: Version commit hash (if existant)
- Wikibase Count: Number of Wikibases with this specific version
- If a version is parsable as a SemVer version, then we have a major/minor/patch version tree
Query:
query MyQuery {
aggregateSoftwarePopularity(pageSize: 10, pageNumber: 1) {
meta {
totalCount
}
data {
softwareName
wikibaseCount
versions {
version
versionDate
versionHash
wikibaseCount
}
}
}
}
Result:
{
"data": {
"aggregateSoftwarePopularity": {
"meta": {
"totalCount": 12
},
"data": [
{
"softwareName": "MediaWiki",
"wikibaseCount": 41,
"versions": [
{
"version": "1.41.0",
"versionDate": "2024-02-07T06:39:00",
"versionHash": "5498056",
"wikibaseCount": 3
},
...
]
},
...
{
"softwareName": "ICU",
"wikibaseCount": 40,
"versions": [
{
"version": "67.1",
"versionDate": null,
"versionHash": null,
"wikibaseCount": 11
},
{
"version": "72.1",
"versionDate": null,
"versionHash": null,
"wikibaseCount": 6
},
...
]
},
...
]
}
}
}
Aggregated from the Property Popularity Observations above. The data is paginated as above in Wikibase List; the Page Number and Page Size arguments are the same, and the queries return Meta and Data as above.
- Property URL
- Usage Count: Sum of usages in all Wikibases
- Wikibase Count: Number of Wikibases that use this property
The data is ordered by Wikibase Count descending, then Usage Count descending, then alphabetically by Property URL
Query:
query MyQuery {
aggregatePropertyPopularity(pageNumber: 1, pageSize: 10) {
meta {
totalCount
}
data {
propertyUrl
usageCount
wikibaseCount
}
}
}
Result:
{
"data": {
"aggregatePropertyPopularity": {
"meta": {
"totalCount": 79501
},
"data": [
{
"propertyUrl": "http://schema.org/description",
"usageCount": 3149752997,
"wikibaseCount": 17
},
{
"propertyUrl": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"usageCount": 1944635733,
"wikibaseCount": 17
},
...
]
}
}
}
Data aggregated from the Quantity Observations above.
Query:
query MyQuery {
aggregateQuantity {
totalItems
totalLexemes
totalProperties
totalTriples
wikibaseCount
}
}
Result:
{
"data": {
"aggregateQuantity": {
"totalItems": 123291236,
"totalLexemes": 1314076,
"totalProperties": 20368,
"totalTriples": 17745095804,
"wikibaseCount": 17
}
}
}
Aggregated from the Statistics Observations above. As indicated above, averages (edits per page and words per content page) are calculated at query time.
Query:
query MyQuery {
aggregateStatistics {
wikibaseCount
edits {
editsPerPageAvg
totalEdits
}
files {
totalFiles
}
pages {
contentPageWordCountAvg
contentPageWordCountTotal
contentPages
totalPages
}
users {
activeUsers
totalAdmin
totalUsers
}
}
}
Result:
{
"data": {
"aggregateStatistics": {
"wikibaseCount": 39,
"edits": {
"editsPerPageAvg": 4.912389909272705,
"totalEdits": 102551430
},
"files": {
"totalFiles": 107936
},
"pages": {
"contentPageWordCountAvg": 5.910143746962713,
"contentPageWordCountTotal": 23289542,
"contentPages": 3940605,
"totalPages": 20876077
},
"users": {
"activeUsers": 912,
"totalAdmin": 347,
"totalUsers": 284827
}
}
}
}
Data aggregated ffrom the User Observations above.
Query:
query MyQuery {
aggregateUsers {
totalAdmin
totalUsers
wikibaseCount
}
}
Result:
{
"data": {
"aggregateUsers": {
"totalAdmin": 366,
"totalUsers": 227834,
"wikibaseCount": 43
}
}
}