Summary of meetings held during Biodiversity Next in Leiden, October 2019

TDWG (Biodiversity Information Standards) met as part of the Biodiversity-Next Conference in Leiden from 20-25 October 2019. The opportunity to run extensive workshops at the conference were limited for several reasons. Unfortunately, several of the DQIG Task Group Leaders were unable to attend. The TDWG Data Quality Interest Group met informally on the Sunday, and the Vocabularies of Values Task Group (TG4) ran a workshop on the Monday as part of the Pre-Conference Meetings. In addition, the Interest Group ran symposium session which included a 30 minute discussion period. An informal meeting of the Interest Group was held at the Vlot Grand Café, Leiden on the Sunday (20 Aug). Five hours of valuable discussions took place. Two GBIF staff (Tim Roberston and Dmitry Schigel) joined us and extremely valuable discussions between the two groups took place, to the advantage of both. It was a pity that a larger Interest Group discussion was not possible during the main part of the conference.

TG1 – Framework for Data Quality

Report: Unfortunately, the Task Group Leader, Allan Viega, was unable to attend the meetings in Leiden. It was decided that Task Group 1 should be wound up. Allan and Antonio Saraiva agreed to write a final report and to tidy up the existing documentation on the GitHub. A new Task Group will be established with the aim of developing a TDWG Standard based on the Framework. Paul Morris has agreed to lead this Task Group, and reported that an RDF document was under development, and would form the basis of a Standard. It was agreed that the Vocabulary for the Framework be combined with the Vocabulary that has been developed for Task Group 2, as there was considerable overlap, and that there should be consistency across both Task Groups.

TG2 – Tests and Assertions

Report: Unfortunately, Lee was not able to be present, but we were able to have extensive discussions around some of the outstanding issues regarding the Tests and Assertions. Several difficult issues were discussed, and a plan agreed for moving forward. The main issue was how we deal with tests that link to vocabularies, and the need for Parameterization or not for many of these tests. Tim Roberston from GBIF explained where GBIF were heading with plans for a much more rapid turnover of data incorporation, and how tests (for example taxon data) could check on the fly against a GBIF Backbone, an Australian Taxonomy or ITIS for North America, etc., depending on users’ needs. This confirmed our need to be able to parameterize many of the tests, thus allowing those running the tests, to choose the vocabulary that best suited their needs. This would require such tests to have an “expectation” – for example, if the Atlas of Living Australia runs the tests using the Australian Taxonomy, then an expectation is that only Australian taxon names could be reported, and nothing outside that list would apply. More work will be needed to formalise this within the tests. It was agreed that a meeting needed to be held soon to sort out some of these issues, to finalise how we deal with some of the final coding, and what is needed to develop the tests as a TDWG Standard. In discussing options for such a meeting, John Wieczorek and Paula Zermoglio suggested that Bariloche in Argentina would be ideal. It was agreed that plans be put in place to organise such a meeting around January of 2020. TDWG to be approached to see what funding may be available. Informal discussions during the week elicited that TDWG may be able to fund one person. Funding opportunities will need to be pursued asap. It was reported that one of the ongoing issues with respect to the tests was the need for Vocabularies for many (about 29) of the tests. These vocabularies are also needed for the Darwin Core. Preliminary discussions were held with two software venders (Specify and Symbiota) and both expressed an interest in incorporating the tests into their software. This would be a major step, as it would provide opportunities for much of the Data Quality testing to be done in collections institutions, thus taking the data quality control closer to the source. iNaturalist and eBird both also expressed an interest in incorporating relevant tests into their data quality control systems. Task still needed

Finalise how we deal with Parameterization
Determine how we code the tests that require Parameterization
Determine how we handle tests that require Vocabularies that may not be currently available.
1. Extract terms of value from the tests as a first step in preparation of a Vocabulary.
Load up into an html/RDF document in preparation for a standard (Kurator has some scripts - will work with iDigBio and Lee)
Prepare test dataset (this will be a real dataset with synthetic modifications). An explanation can be found at https://github.com/tdwg/bdq/wiki/TG2---Proposal-for-identifying-synthetic-data
Explore (especially in conjunction with GBIF, iDigBio and ALA) how best to handle Annotations. There was some discussion on using a centralized (in the cloud) model versus a semi-distributed model and the possibility of having a centralized test Sandbox. An issue (#154) has been raised on GitHub.
Explore what is needed to advance the Tests toward a TDWG Standard. It was thought that either a Best Current Practice or an Applicability Statement would be the most appropriate standard type. It was noted, however, that the Code needs to be written before the development of a Standard can be considered in order to better exercise the assumptions made in real-world situations.

TG3 – Use Case libraries

It was agreed that Task Group 3 be wound up. The paper currently being prepared for BISS would largely constitue a final report.

TG4 – Best Practice for Development of Vocabularies of Values

Paula Zermoglio, Task Group leader, ran a workshop on the morning of Monday 21st October. As an introduction, she presented a Power Point that covered most of the issues. This presentation can be found at http://tiny.cc/d8uvez. She covered topics such as

The difference between
- Controlled Vocabularies
- Thesauri
- Ontologies.
The need for Vocabularies of Values
- For Darwin Core Terms
- For the Tests as part of Task Group 2.
and laid out the process to be followed over the next period.

She talked about the TDWG Standards Documentation Specification. She noted that the Standards have several parts (see the Power Point presentation), and that these included

```
Human readable documents
```
- Landing page
- Descriptive documents
- Vocabulary descriptions
Machine readable documents

She also mentioned the importance of following the W3C Skos Primer guidelines as well. Other working documents can all be found on the Data Quality Interest Group Wiki. This was followed by extensive discussion and a short period for people to work on beginning development of a vocabulary. One suggestion from the floor was the need for a Vocabularies of Values Quick Reference Guide – i.e. easy to follow guidelines for someone who wants to begin the process of developing a vocabulary of values.

Next steps:

Biodiversity Next Symposium

A symposium session was held on Wednesday 23rd entitled: “Biodiversity Data Quality – how it affects science”. Five talks were given that were all very relevant to the topic. Around 100 people attended, leaving standing room only in the room. A half hour lively discussion followed. It was good to have several GBIF staff present and they were able to contribute significantly to the debate.

Biodiversity Data Quality featured heavily in many of the sessions throughout the Conference, including in several of the Key Note addresses.

Article in Biodiversity Information Science and Standards (BISS)

The paper “Developing Stands for Improved Data Quality and for Selecting Fit for Use Biodiversity Data” is nearing completion and will be submitted within the next month or two. The big hold-up has the Case study using MCZ data and the TIME tests, but this has now been completed and is just awaiting final write-up.

Next Steps

Final edits by key authors
Open up the document for a final check by all authors
Authors to check that their previous comments have been satisfactorily resolved.
The paper has still not been moved from the Biodiversity Data Journal to BISS, despite several requests from Gail Kampmeier. This still needs to be done.
Submit document

Around the Rooms (DQ during the SPNHC and TDWG Meetings)

Throughout the week of Symposia and Workshops, there was barely a session in which Data Quality was not a major topic. Conveners of the Interest Group and Task Group leaders were active in discussions - including on Data Quality - in sessions involving Traits, Machine recorded data, Citizen Science, Invasive Species, Interactions, Feedback to the data custodians and others. In many of these sessions there was vigorous debate and the DQIG people took note of the issues and made many notes for further discussion and action.

Possible Future Standards

Throughout the week, the possible standards arising from the work of the DQIG and the Task Groups were thought to include

Framework - Technical Specification (RDF)
Data Quality Vocabulary for the Framework and for the Tests and Assertions - Data Standard
Tests and Assertions - Best Current Practices Document OR Applicability Statement to the Framework TS (or maybe to Darwin Core)
Vocabularies of Values - Best Current Practices Document

Provide feedback

Saved searches

Use saved searches to filter your results more quickly