TERRA-REF Data Processing Infrastructure

July 22, 2018
Figures from manuscript: 1. Field Scanalyzer System operating in Maricopa, Arizona. 2 data flow and processing diagram. 3 field level mosaic from RGB camera. 4 table of sensors. 5 databases and interfaces 6 data analysis workbench

The Transportation Energy Resources from Renewable Agriculture Phenotyping Reference Platform (TERRA-REF) provides a data and computation pipeline responsible for collecting, transferring, processing and distributing large volumes of crop sensing and genomic data from genetically informative germplasm sets. The primary source of these data is a field scanner system built over an experimental field at the University of Arizona Maricopa Agricultural Center. The scanner uses several different sensors to observe the field at a dense collection frequency with high resolution. These sensors include RGB stereo, thermal, pulse-amplitude modulated chlorophyll fluorescence, imaging spectrometer cameras, a 3D laser scanner, and environmental monitors. In addition, data from sensors mounted on tractors, UAVs, an indoor controlled-environment facility, and manually collected measurements are integrated into the pipeline. Upt to two TB of data per day are collected and transferred to NCSA at the University of Illinois where they are processed.

In this paper we describe the technical architecture for the TERRA-REF data and computing pipeline. This modular and scalable pipeline provides a suite of components to convert raw imagery to standard formats, geospatially subset data, and identify biophysical and physiological plant features related to crop productivity, resource use, and stress tolerance. Derived data products are uploaded to the Clowder content management system and the BETYdb traits and yields database for querying, supporting research at an experimental plot level. All software is open source under a BSD 3-clause or similar license and the data products are open access (currently for evaluation with a full release in fall 2019). In addition, we provide computing environments in which users can explore data and develop new tools. The goal of this system is to enable scientists to evaluate and use data, create new algorithms, and advance the science of digital agriculture and crop improvement.