Data Validation & Verification
Merseyside BioBank exists in a large part of facilitate data flow and to enhance the impact of data for the benefit of local biodiversity. This includes both species and habitat and may involve the use of data to advise development, planning, mid to longer term strategy and targeting conservation efforts such as identifying species or habitats that should be considered conservation priorities at the local level, habitat management or species recovery. Biodiversity data then can have an enormous impact on decision making and conservation action.
However, we must consider that the vast majority of data comes from largely volunteer driven sources with comparatively little, particularly species data, from commissioned or ‘professional’ sources. The high level of impact coupled with the range of largely voluntary sourced data might be considered as a potential risk. Poor quality data could lead to poor decisions.
As a result all data submitted to the LERC must be subject to quality control processes that include standardisation, validation and independent verification.
Standardisation
The standardisation of data is often the first step for data received by Merseyside BioBank. The process overlaps a little with validation checks but it is fundamentally different. Standardisation typically involves the conversion or processing of a data source into something that can be managed and shared alongside other distinct data sources. With regards to species data this typically means ensuring compatibility with the Recorder 6 database and may involve conversation to a suitable import format, allocation of scientific species names, formatting of dates or grid references an general review and tidying of the source data which could be in almost *any* format.
Validation
The validation checks are the first pass of quality checks that are under-taken on all data received. Validation is undertaken at MBB and involves programmatic checks. Building on standardisation we check things like that date is a true date and the grid reference as a proper grid reference for the area. Increasingly validation checks have become more advanced with technological improvements and can now also involve more informed checks on things such as whether or not a species occurs in the area (it’s UK distribution) or the stage of the life cycle matches when the observation was made.
These rules are currently updated by national schemes and societies and made available via the National Biodiversity Network (NBN) Record Cleaner and are considered first pass checks to identify records that are outliers.
Verification
Verification is the application of knowledge in the judgement of a biological record. It involves the review of a record or records by a specialist in that taxonomic group and someone who is independent from the records centre. Where possible a verifier is a local representative of the appropriate national scheme or society and most of the larger recording schemes have such ‘vice-county recorders’. Where a vice-county recorder is not available we would defer to a national scheme organiser, this is the case for some of the invertebrate families. Where a vice-county or national recorder is not available then we work with a specialist in the group drawn from other areas. This could include Museum curators, Researchers or highly experienced naturalists recognised by local natural history societies.
Ultimately the key components of a verifier are that they are highly knowledgeable in that species group in the local area, North Merseyside and that they are independent of the records centre.
Verification itself can be undertaken in a number of different ways and ultimately we defer to the preference of the verifier in terms of both data handling and how they choose to judge a record.
Outline verification process
The process of verification is typically as follows;
- Records which the verifier wants to see will be exported from the Recorder 6 database;
- Exported records are pre-formatted into a batch Excel worksheet that assists the verifier in applying one of the ‘Determination Types’ then emailed with instructions to the verifier;
- The verifier reviews the records using their own judgement based on the information presented to them in the spreadsheet and apply appropriate determinations.
- The completed worksheet is returned and made ready for import into the Recorder 6 database;
- Any manual changes to records are made in the Recorder 6 database as requested by the verifier (e.g. taxonomic or grid reference amendments);
- The remaining records are updated using their unique database key with the new determination type provided by the verifier;
- From this point verification is complete and the new status is live. Failed records are not removed and so remain logged as failed. This is important as it means we retain an administrative log of the process and why and how the record was failed. Also, should new information be provided to support the initial identification then a record may still be re-verified.
Any record determined as: Incorrect, Considered Incorrect, Duplicate, Superseded, are immediately removed from circulation and all LERC services. Any record determined as: Correct, Considered Correct are considered positively verified and now carry more weight and confidence. Any record that has not yet been verified or was not reviewed by the verifier remains in use* but carries a lower level of confidence than a verified records. Unchecked data may also be immediately invalidated without going through the verification process should their veracity be questioned; questioning of a verified records will likely prompt a more in-depth review.
Technology has of course enabled other means of verification. There is a well recognised and growing problem around verification due to the pressure being put on a small number of individuals, often volunteers, being asked to verify ever increasing amounts of ‘data’. An offer of technological solution has arisen in the form of iRecord which is maintained by the Biological Records Centre (BRC) and Centre for Ecology and Hydrology (UKCEH) the system centralises data flow and makes it easier to review original observations, their evidence and contact the original recorder as necessary. It also means that a verifier can apply verification at a single point instead of using a range of methods and formats.
* Unchecked records are used, despite not being verified, for a range of largely pragmatic reasons. The verification process takes a long time, sometimes years to complete, even where vice-county recorders (VCR) exist and have time, and is indeed never fully complete as new information are constantly provided. Where no VCR exists things can take longer and for some (invertebrate) groups there are is simply no-one to undertake the verification. As a recorder centre we have a responsibility share and provide data for the benefit of biodiversity. To have data and not share it is considered a much higher risk than sharing the occasional duff record. Particularly when the main uses of data are generally indications of presence should always be interpreted by a professional ecologist (either consultant, planning or both) who may always apply their own professional judgement to any data provided.