The landscape of soil carbon data: Emerging questions, synergies and databases

Soil carbon has been measured for over a century in applications ranging from understanding biogeochemical processes in natural ecosystems to quantifying the productivity and health of managed systems. Consolidating diverse soil carbon datasets is increasingly important to maximize their value, particularly with growing anthropogenic and climate change pressures. In this progress report, we describe recent advances in soil carbon data led by the International Soil Carbon Network and other networks. We highlight priority areas of research requiring soil carbon data, including (a) quantifying boreal, arctic and wetland carbon stocks, (b) understanding the timescales of soil carbon persistence using radiocarbon and chronosequence studies, (c) synthesizing long-term and experimental data to inform carbon stock vulnerability to global change, (d) quantifying root influences on soil carbon and (e) identifying gaps in model–data integration. We also describe the landscape of soil datasets currently available, highlighting their strengths, weaknesses and synergies. Now more than ever, integrated soil data are needed to inform climate mitigation, land management and agricultural practices. This report will aid new data users in navigating various soil databases and encourage scientists to make their measurements publicly available and to join forces to find soil-related solutions.


I Introduction
Soil carbon is a key component in our understanding of the biosphere's response to global change. There is a long history of soil carbon measurements that, together with other types of soil and ecosystem data, contribute to our understanding of the health and functioning of natural and managed ecosystems (Harden et al., 2018). To better utilize this body of work, the International Soil Carbon Network (ISCN) was formed in 2012 to connect soil carbon researchers and their data. Here, we present recent international efforts consolidating soil carbon data to address urgent questions in soil carbon science. We highlight advances in soil databases, led by the ISCN or other organizations, to synthesize datasets from diverse sources. Examples include data from boreal, arctic and wetland soils, long-term soil experiments, chronosequences, soil radiocarbon observations and root-soil linkages. These new data will help us to understand soil carbon stocks, change and vulnerability via syntheses and model-data integration.

What is the ISCN?
The ISCN is a science-based network that provides: (a) scientific and logistical infrastructure for sharing knowledge, information and data; (b) opportunities for synthesis activities; (c) data products beneficial to stakeholders and scientists; and (d) a framework for common scientific protocols and collaborative decision support tools.

Why soil carbon?
Soil carbon storage and cycling are measures of soil health, where soil health is defined as the capacity of soil to maintain a range of functions such as food and fiber provision (Lal, 2004;Banwart et al., 2014). Soil carbon is also directly linked to exchanges of carbon dioxide and other trace gases between land-water and land-air systems and is therefore a key component in regulating the global climate system (Ciais et al., 2013). Because soils are a focal point of terrestrial carbon cycling, current research prioritizes quantifying global and ecosystem-specific carbon stocks. In addition to stocks, understanding the processes controlling soil carbon timescales and vulnerability to global change are also critical ( Figure 1). These research priorities require diverse data types synthesized across broad scales.

Why now?
Land is increasingly under pressure to maintain healthy ecosystems while providing food and fiber to growing human populations. Over onethird of the global land surface is currently grazed, forested or cropped (Erb et al., 2007), rendering three-quarters or more of the soil carbon down to a meter depth under human management (Harden et al., 2018). Past land management has depleted soil carbon and organic matter (Sanderman et al., 2018). However, the reestablishment and build-up of this organic matter through best practices can improve soil productivity and resilience to extreme climate events while removing carbon dioxide from the atmosphere (Minasny et al., 2017;Batjes, 2019).
Although scientific research on soil carbon has led to numerous sources of data and information, such information is disparate and difficult to access (Harden et al., 2018). Communities interested in making carbon cycle projections or improving agricultural land management need synthesized data to evaluate soil carbon persistence and vulnerabilities to environmental change (Blankinship et al., 2018). With emerging technological advances in data, computing and instrumentation, we see an opportunity to inform and empower land managers with timely, relevant data and information for decision support.

ISCN data holdings
The ISCN database (latest version ISCN3; Nave et al., 2017) contains data from >70,000 soil profiles from a range of data sources, including the United States Department of Agriculture and the Northern Circumpolar Soil Carbon Database. More than 200 soil variables are present in the database, including the percentage of organic carbon, particle size distribution, pH and the percentage of nitrogen. Details of the data types and their calculations can be found at https://iscn.fluxdata.org/data/data-informa tion/. The inclusion of a range of supporting measurements (describing, for example, the geography, soil properties and landform type) in the ISCN database makes it possible to investigate soil carbon as part of a dynamic cycle in addition to quantifying stocks. The strengths of the ISCN database include extensive coverage for soil profiles, horizons and depth internationally (with particularly strong representation of the US from United States Department of Figure 1. Key questions in soil carbon science and corresponding data requirements. Emergent questions about soil carbon vulnerability (e.g., the response to abiotic/climate or biotic/land cover change, management or disturbance) must be underpinned by questions of soil carbon change (timescales, persistence and stability, factors controlling microbial access, quality and fraction), which, in turn, are rooted in questions of carbon stocks (e.g., spatial variability, ecosystem-specific storage or depth variation). The syntheses described in this paper represent a range of data/efforts directed toward addressing each of these knowledge gaps. For example, research questions on soil carbon vulnerability may utilize data from experimental manipulations and plant trait databases; carbon change questions from databases such as the ISCN3 database, chronosequence and radiocarbon syntheses; and carbon stock questions from global or ecosystem-specific survey data.
Agriculture data) making ISCN one of the largest, most wide-ranging and diverse repositories of measured soil data.
II Recent advances in the ISCN database 2.1 Shift from template-only to script-based data ingestion Historically, the ISCN has taken a templatebased approach to data harmonization, in which data providers and curators manually input data into the ISCN database. Given that this approach can be both labor intensive and prone to errors, the ISCN is adding a scripted option for data users and providers. The SOC-DRaHR (Soil Organic Carbon Data Rescue and Harmonization Repository; https://github.com/ISCN/SOC-DRaHR) is a script repository with an associated R package designed to aid data ingestion and downloading. The SOC-DRaHR also provides a community platform to develop an R library to access and harmonize different data collections.
The SOC-DRaHR identifies and downloads soil carbon datasets that are publicly available, provides data harmonization scripts to integrate these datasets into R and provides output scripts for a harmonized data product. In short, these scripts match the variable names of the dataset to be ingested with those contained in the ISCN template. The SOC-DRaHR is not a data repository or archive, but is an open source software project that facilitates access to data and harmonizes units and naming conventions across data collections. One limitation of a script-based approach is that it may decrease data user/provider accessibility if they do not have experience with R or other programming languages. To address this, we plan to keep the template option for users that prefer it.
Led by Katherine Todd-Brown, the ISCN hosted two data hackathons (New Orleans, LA, USA in 2016; College Station, TX, USA in 2017) to train potential contributors and users of ISCN data on our scripted approach. We also provided guidance and expertise to other science communities building soil or ecological databases (Table 1).

Shift toward open data
The ISCN3 database contains data from sources with varying data-use policies (http://iscn.flux data.org/data/data-information/data-policyand-use/). In the future, the ISCN4 database and subsequent versions will only contain data that are open source under a Creative Commons Attribution (CC-BY) License. The key update under this license will be that the requirements of dataprovider involvement will be removed, but data attribution will be required as before. Previous versions of data bound to sharing restrictions will be retained, but will only be available through the ISCN3 database. The ISCN4 database will include the open source data from ISCN3 plus new datasets (Table 1). We consider this open source shift an important step in making the ISCN data easily accessible and usable.

ISCN-led community activities
We held our most recent all-hands meeting at the American Geophysical Union (AGU) fall meeting in December 2017. The meeting included updates from the ISCN as well as breakout groups on root-soil linkages, wetland soil carbon, the turnover times of soil carbon and reconciling multi-scale data (http://iscn. fluxdata.org/summary-of-pre-agu-2017-activities/). We also organized oral and poster sessions at AGU 2017 and 2018. In February 2017, we organized a workshop (Loisel et al., 2017) to discuss and define research and data priorities for soil carbon science and for the ISCN. We drafted an article highlighting the converging needs of the soil carbon science and soil health communities and the way forward for the ISCN (Harden et al., 2018). The ISCN plans to continue to coordinate and host workshops, data hackathons and scientific sessions at international meetings (e.g., the AGU and the European Geophysical Union).
III New datasets and emerging ISCN partnerships 3.1 Advances in northern and wetland soil carbon data Northern peatlands and permafrost soils are rich in carbon that is vulnerable to increased rates of warming and other feedbacks with climate change (Gorham, 1991;Oechel et al., 1993;Frolking et al., 2006;Tarnocai et al., 2009;Schuur et al., 2015). The drivers of soil carbon storage in organic soils can vary considerably relative to those in mineral soils (Limpens et al., 2008;Loranty et al., 2018;Malhotra et al., 2018a;Schuur and Mack, 2018). To better place these soils in a global context, the ISCN is including more data from peatlands (Treat et al., 2016, data from C-PEAT; https://github. com/ISCN/soilDataR/blob/master/R/read CPEAT.R) in the next version of the database (ISCN4 ; Table 1). We plan to also include Canadian forest soil surveys representing a decade of data (Shaw et al., 2018). Although not always organic soils, these northern forest soils are also expected to undergo warming (Meehl et al., 2007) and provide opportunities for contrasting studies of mineral and organic soils across climate gradients.

Bridging gaps in soil data types
The strengths of the ISCN3 database lie in global survey data that are reported with a range of supporting measurements and are best suited to investigating the mechanisms of soil carbon change ( Figure 1). However, other types of data are necessary for carbon stock and vulnerability questions ( Figure 1)-for example, data from coastal systems, radiocarbon measurements, soil chronosequences, experiments (field manipulations), long-term repeat measurements or root-soil linkages. In an effort to increase our representation of diverse data types, we have built informal (e.g., sharing best practices and data harmonization scripts) or formal (memoranda of understanding) synergies with the various groups discussed in the following sections.
3.2.1 Coastal wetland carbon. Coastal wetlands are highly productive and, because they form soil as a dynamic response to sea-level rise (Kirwan and Megonigal, 2013), they act as longterm carbon sinks. "Blue carbon" syntheses have been used to support local greenhouse gas mitigation efforts (Kroeger et al., 2017) to include coastal wetlands in national-scale greenhouse gas inventories (Crooks et al., 2018;Holmquist et al., 2018) and to complete terrestrial-aquatic interface carbon budgets (Najjar et al., 2018). There is a tremendous need for a transparent, well-sourced and living synthesis of coastal carbon stocks. The Coastal Carbon Research Coordination Network (CC-RCN) is currently building such a dataset iteratively, producing standards for data formatting, assisting researchers in creating citable open data releases (Reichman et al., 2011;Wilson et al., 2017) and compiling public data releases into a central data clearinghouse. CC-RCN personnel are available (until at least 2021) to help providers prepare datasets for submission. To date, the CC-RCN has synthesized data from 3117 cores from salt marshes, mangroves and tidal freshwater wetlands of the contiguous USA (Holmquist et al., 2018) and from around the world. The ISCN and CC-RCN share lessons-learned on database best practices through workshops and hackathons. In the future, we aspire to formally link our databases through the SOC-DRaRH. (ISRaD) is an open source community-based project that brings together soil radiocarbon data and associated datasets (Lawrence et al., 2019). Radiocarbon data are an important tool for understanding the soil carbon cycle and can be used to constrain rates of carbon cycling in models (He et al., 2016) and to assess the timescales and persistence of soil carbon (Sierra et al., 2018). In particular, the application of radiocarbon methodology to improve our understanding of soil carbon dynamics has emphasized the need to conceptualize soils as a consortium of different carbon types, stabilized in soils via a variety of mechanisms. As such, there is a growing abundance of soil data collected from specific soil "fractions" that have been physically (e.g., density or particle size separation), chemically (e.g., chemical extraction) or biologically (e.g., soil incubation) partitioned from the bulk soil (Poeplau et al., 2018). Although these data may provide insights into the nature of a particular soil, it is often challenging to compare fractions across different soils because the fractional methods vary widely. The ISRaD also seeks to improve our ability to compare soil fractions and standardize fractionation methods, in addition to making soil radiocarbon data more accessible.

Soil radiocarbon data. The International Soil Radiocarbon Database
The data within ISRaD are structured hierarchically and include bulk soil radiocarbon data (about 500 sites and 1700 profiles), fractionation schemes (>3600 data points entered), flux measurements (>2100), incubations (>1900), interstitial gases and dissolved organics. Users can add data through a template, which is structured to reflect this hierarchy, or use a scripted approach for larger datasets. In addition to the dataset, the ISRaD also offers an associated R package, which includes qualitycontrol checks and tools for exploring the data. Ongoing synthesis activities have compiled radiocarbon data from carbon fluxes in the arctic region to look at the potential release of old permafrost carbon from soil incubations to assess rates of fast-cycling soil carbon and from different soil fractions. Although radiocarbon is the focus of the database, it is not a requirement, allowing the template, data structure and associated tools to be used for other synthesis efforts related to soil carbon. The ISRaD data template builds upon the ISCN template and profile-level soil data will be shared between the ISRaD and the ISCN databases.

Soil chronosequence data.
Understanding long-term soil carbon dynamics is important for constraining the capacity of soils to store carbon and the spatiotemporal variations in soil carbon related to pedogenic mineralogy. The chronosequence approach has been traditionally used to study the role of time in pedogenesis (Stevens and Walker, 1970). As a result, many chronosequence studies have reported soil carbon data along with other soil and environmental variables. Comparisons of several chronosequences have been used to determine general patterns in soil and ecosystem development and to investigate the effects of other soil-forming factors on carbon, nutrients and mineralogy (Wardle et al., 2004). A recent effort synthesized data from soil chronosequences with the goal of determining the controls on long-term soil carbon dynamics during soil development. The structure of this dataset follows the hierarchical structure of the ISCN dataset and draws upon the ISRaD dataset in terms of the included variables and tools for data analysis. Upon completion, data from this synthesis will be ingested into the ISCN database.

Experimental and long-term data.
Cross-site analysis is a central goal of the Long-term Ecological Research (LTER) program and significant advances have been made in synthesizing cross-site data in hydrology, vegetation dynamics, diversity and climate (Peters et al., 2013). Although soil carbon has been measured at almost all the LTER sites as well as at sites from other research networks, cross-network data have, to our knowledge, never been synthesized, compared, modeled or archived in standardized ways across sites (Weintraub et al., 2019). A new synthesis project is addressing this gap by synthesizing long-term soil carbon data not just from the LTER sites, but also from the National Ecological Observatory Network (NEON), the Critical Zone Observatory (CZO), the Detritus Input and Removal Treatment (DIRT) and the Nutrient Network (NutNet) sites. This project uses a scripted approach similar to the ISCN and ISRaD databases and involves researchers who developed soil models such as MIMICS and CORPSE (Sulman et al., 2014;Wieder et al., 2015), as well as the principal investigators who collected the soil carbon data. The model-data synthesis aims to answer questions such as the roles microbial and plant community composition have in transferring microbial byproducts to persistent soil organic matter and how nitrogen deposition affects soil organic matter composition across a range of climate types and mineralogies. Practical implications include outreach to land managers concerned with soil carbon consequences of specific practices.
3.2.5 Linking root traits to soil carbon. Plant root inputs are more likely to be stabilized as longterm soil carbon relative to above-ground plant inputs (Jackson et al., 2017;Sokol and Bradford, 2018;. Despite their recognized importance in soil carbon dynamics, data on root attributes or traits (e.g., root biomass or rooting depth) are severely lacking in soil databases (Harden et al., 2018). Root observations from across the globe have recently been compiled into the Fine-Root Ecology Database (FRED; Iversen et al., 2017). FRED version 2.0 includes more than 100,000 root trait observations and relevant ancillary data such as soil properties, providing an opportunity to harmonize soil and root data. The ISCN has held breakout group discussions and a workshop to develop a framework linking root traits with soil carbon across the globe (Malhotra et al. 2018b). The root trait working group will continue their efforts, focusing on the three main stages of root-soil interactions-namely, rhizosphere engineering by living roots, root inputs to soil organic matter via turnover and the decay of root necromass throughout the soil profile.
3.2.6 Mechanisms of soil carbon storage and stability. In coordination with the ISCN, the U.S. Geological Survey (USGS) and the U.S. Department of Agriculture recently supported a series of USGS Powell Center workshops targeted toward improving our understanding of the mechanisms controlling soil carbon storage and stability. Several products were derived from these workshops, including an exploration of how soil measurements, models and theories are linked to better integrate rapidly expanding soil research efforts (Blankinship et al., 2018) and a re-evaluation of soil carbon controls using existing databases (Rasmussen et al., 2018). The results of these workshops highlight the crucial importance of including ancillary soil data in soil carbon syntheses and provide further opportunities to better coordinate future soil measurements with models and theory.
3.2.7 Model-data integration. Soil data synthesis efforts strive to inform model development and validation. Model evaluation is an important goal of the International Land Model Benchmarking Project (ILAMB; Collier et al., 2018). The ISCN participated in ILAMB's soil organic carbon working group (Oak Ridge National Laboratory, October 2018) to develop ISCNderived data products that would be useful for model benchmarking.
Beyond benchmarking, there is a growing potential to use synthesized datasets for model-data integration to develop our understanding of soil carbon dynamics (Bloom et al., 2016;Luo et al., 2016). Model-data integration activities can help to determine model structures and parameterizations that are consistent with observations of carbon stocks, soil ages (radiocarbon data), above-and below-ground litter inputs and local conditions (soil texture, moisture and temperature), weighted according to measurement errors. Advances in computing power and algorithm development allow model calibration and evaluation across very large datasets, facilitating our capacity to simulate soil processes both regionally and globally. A key request from the model-data integration community is that soil databases include a clear quantification of all sources of measurement error (to allow for Bayesian modeling approaches). In addition, if point data have been converted to gridded products, the upscaling error is a key factor in model-data integration.

IV Navigating the landscape of soil data
The landscape of soil data is complicated and contains a range of databases representing different regions and variables (Figure 2). To a new data user (e.g., a graduate student), it may be daunting to select the right dataset to answer a research question or the best database to target for their data contributions. One of the ISCN's missions is to inform data users of the strengths and weaknesses of each database and to circumvent the issues related to multiple soil databases that are difficult to harmonize. Our recent synergies with the CC-RCN, ISRaD, chronosequences and the LTER/NEON/CZO data syntheses were therefore initiated with the intention of sharing information on best practices, standardizing controlled vocabularies and providing resources such as R scripts to ingest or harmonize data.
In addition, the ISCN and the International Soil Reference and Information Centre (ISRIC) developed a formal agreement to ensure that the ISCN soil profile data are fed into ISRIC's spatially extensive database on a regular basis, following a screening for possible duplicate profiles (Ribeiro et al., 2018). If users are interested in global carbon stocks, they may use the entire World Soil Information Service (WoSIS) database (Batjes et al., 2017) or its derived products (SoilGrids250 m; Hengl et al., 2017). Alternatively, if users are interested in abiotic or mechanistic controls of soil carbon, the ISCN database may be more appropriate because it provides more ancillary data on soil properties and ecology than the ISRIC database.

V Future directions
In the short term, our goal is to provide data infrastructure that enables interoperability, not just between the ISCN's data sources, but also across the synthesis efforts mentioned here. This is a non-trivial task, but the community is ready and the need for harmonized soil datasets is clear.
In the long term, in addition to maintaining the aforementioned data and infrastructure, we would also like to consolidate new data sources and types. Most urgently, given that the extent of managed soils exceeds that of unmanaged soils globally (Harden et al., 2018), the ISCN would like to include more data from agricultural and other managed systems. We hope to continue our discussions with entities such as FarmOS (https://farmos.org/) and the Coordination of International Research Cooperation on soil CArbon Sequestration in Agriculture (CIR-CASA; www.circasa-project.eu/) to consolidate agricultural data into a central repository. This first step is necessary to link management practices with the resulting soil properties. Figure 2. Navigating the landscape of soil data: the ISCN3 database and its current links to other large soil databases. The ISCN3 database comprises various independent data sources that are globally extensive, but with a strong US focus. The data sources include the Natural Resources Conservation Service (NRCS), the United States Geological Survey (USGS) and the Northern Circumpolar Soil Carbon Database (NCSCD). The ISCN publishes a new data version periodically (e.g., ISCN4 will contain new northern and peatland data). In turn, the ISCN data are regularly ingested into the World Soil Information Service database (WoSIS), a larger database focused on nationally reported profile data. Global gridded products such as SoilGrids are derived from profiles held in WoSIS, a set of environmental covariates and digital soil mapping. The ISCN database maintains synergies with various other data synthesis groups (e.g., the International Soil Radiocarbon Database (ISRaD), the Coastal Carbon Research Coordination Network (CC-RCN) and the Long-term Ecological Research (LTER) program; see Section 3.2), which encompass data types not well represented by the ISCN database (e.g., radiocarbon, coastal carbon and experimental manipulations).
Activities summarized in this report highlight emerging priorities within soil carbon science. We especially highlight recent advances in high-latitude soils and at the terrestrial-aquatic interface as well as in experimental, longterm, chronosequence or radiocarbon data. In a complex landscape of soil carbon data and applications, the ISCN and our partners strive to provide resources, data and opportunities for disparate soil carbon communities to exchange ideas and solutions. Promoting healthy soils and finding creative solutions for climate change mitigation and adaptation will require collaboration among land managers, policy-makers and scientists. Our report serves as a call for input from these diverse and often disconnected communities. In addition to data contributions, we welcome and encourage working groups that can synthesize existing data to bridge the gap between soil carbon science and soil management.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/ or publication of this article: A subset of the workshops and synthesis efforts mentioned in this report were funded by Bonanza Creek LTER, Stanford University's Cox Fellowship and The Bolin Centre for Climate Research at Stockholm University. G.