Enabling automated herbarium sheet image post-processing using neural network models for color reference chart detection

Appl Plant Sci. 2020 Mar 2;8(3):e11331. doi: 10.1002/aps3.11331. eCollection 2020 Mar.

Abstract

Premise: Large-scale efforts to digitize herbaria have resulted in more than 18 million publicly available Plantae images on sites such as iDigBio. The automation of image post-processing will lead to time savings in the digitization of biological specimens, as well as improvements in data quality. Here, new and modified neural network methodologies were developed to automatically detect color reference charts (CRC), enabling the future automation of various post-processing tasks.

Methods and results: We used 1000 herbarium specimen images from 52 herbaria to test our novel neural network model, ColorNet, which was developed to identify CRCs smaller than 4 cm2, resulting in a 30% increase in accuracy over the performance of other state-of-the-art models such as Faster R-CNN. For larger CRCs, we propose modifications to Faster R-CNN to increase inference speed.

Conclusions: Our proposed neural networks detect a range of CRCs, which may enable the automation of post-processing tasks found in herbarium digitization workflows, such as image orientation or white balance correction.

Keywords: automation; digitization; herbarium; machine learning; natural history collections; specimen images.