CGIAR BigData Articles in Refereed Journals
Permanent URI for this collectionhttps://hdl.handle.net/10568/89421
Browse
Recent Submissions
Item Responsible artificial intelligence in agriculture requires systemic understanding of risks and externalities(Journal Article, 2022-02) Tzachor, Asaf; Devare, Medha; King, Brian; Avin, Shahar; Ó hÉigeartaigh, SeanItem AgroFIMS: A tool to enable digital collection of standards-compliant FAIR data(Journal Article, 2021-10) Devare, Medha; Aubert, Céline; Benites Alfaro, Omar Eduardo; Pérez Masias, Ivan Omar; Laporte, Marie-AngéliqueAgricultural research has been traditionally driven by linear approaches dictated by hypothesis-testing. With the advent of powerful data science capabilities, predictive, empirical approaches are possible that operate over large data pools to discern patterns. Such data pools need to contain well-described, machine-interpretable, and openly available data (represented by high-scoring Findable, Accessible, Interoperable, and Reusable—or FAIR—resources). CGIAR's Platform for Big Data in Agriculture has developed several solutions to help researchers generate open and FAIR outputs, determine their FAIRness in quantitative terms1, and to create high-value data products drawing on these outputs. By accelerating the speed and efficiency of research, these approaches facilitate innovation, allowing the agricultural sector to respond agilely to farmer challenges. In this paper, we describe the Agronomy Field Information Management System or AgroFIMS, a web-based, open-source tool that helps generate data that is “born FAIRer” by addressing data interoperability to enable aggregation and easier value derivation from data. Although license choice to determine accessibility is at the discretion of the user, AgroFIMS provides consistent and rich metadata helping users more easily comply with institutional, founder and publisher FAIR mandates. The tool enables the creation of fieldbooks through a user-friendly interface that allows the entry of metadata tied to the Dublin Core standard schema, and trial details via picklists or autocomplete that are based on semantic standards like the Agronomy Ontology (AgrO). Choices are organized by field operations or measurements of relevance to an agronomist, with specific terms drawn from ontologies. Once the user has stepped through required fields and desired modules to describe their trial management practices and measurement parameters, they can download the fieldbook to use as a standalone Excel-driven file, or employ via free Android-based KDSmart, Fieldbook, or ODK applications for digital data collection. Collected data can be imported back to AgroFIMS for statistical analysis and reports. Development plans for 2021 include new features such ability to clone fieldbooks and the creation of agronomic questionnaires. AgroFIMS will also allow archiving of FAIR data after collection and analysis from a database and to repository platforms for wider sharing.Item Artificial intelligence, systemic risks, and sustainability(Journal Article, 2021-11) Galaz, Victor; Centeno, Miguel A; Callahan, Peter W.; Causevic, Amar; Patterson, Thayer; Brass, Irina; Baum, Seth; Farber, Darryl; Fischer, Joern; Garcia, David; McPhearson, Timon; Jiménez, Daniel; King, Brian; Larcey, Paul; Levy, KarenAutomated decision making and predictive analytics through artificial intelligence, in combination with rapid progress in technologies such as sensor technology and robotics are likely to change the way individuals, communities, governments and private actors perceive and respond to climate and ecological change. Methods based on various forms of artificial intelligence are already today being applied in a number of research fields related to climate change and environmental monitoring. Investments into applications of these technologies in agriculture, forestry and the extraction of marine resources also seem to be increasing rapidly. Despite a growing interest in, and deployment of AI-technologies in domains critical for sustainability, few have explored possible systemic risks in depth. This article offers a global overview of the progress of such technologies in sectors with high impact potential for sustainability like farming, forestry and the extraction of marine resources. We also identify possible systemic risks in these domains including a) algorithmic bias and allocative harms; b) unequal access and benefits; c) cascading failures and external disruptions, and d) trade-offs between efficiency and resilience. We explore these emerging risks, identify critical questions, and discuss the limitations of current governance mechanisms in addressing AI sustainability risks in these sectors.Item CGIAR modeling approaches for resource-constrained scenarios: I. Accelerating crop breeding for a changing climate(Journal Article, 2020-03-04) Ramírez Villegas, Julián Armando; Molero Milan, Anabel; Alexandrov, Nickolai; Asseng, Senthold; Challinor, Andrew J.; Crossa, José; Eeuwijk, Fred A. van; Ghanem, Michel Edmond; Grenier, Cécile; Heinemann, Alexandre B.; Wang, Jiankang; Juliana, Philomin; Kehel, Zakaria; Kholová, Jana; Koo, Jawoo; Pequeno, Diego Notelo Luz; Quiróz, Roberto; Rebolledo, Maria C.; Sukumaran, Sivakumar; Vadez, Vincent; White, Jeffrey W.; Reynolds, Matthew P.Crop improvement efforts aiming at increasing crop production (quantity, quality) and adapting to climate change have been subject of active research over the past years. But, the question remains ‘to what extent can breeding gains be achieved under a changing climate, at a pace sufficient to usefully contribute to climate adaptation, mitigation and food security?’. Here, we address this question by critically reviewing how model-based approaches can be used to assist breeding activities, with particular focus on all CGIAR (formerly the Consultative Group on International Agricultural Research but now known simply as CGIAR) breeding programs. Crop modeling can underpin breeding efforts in many different ways, including assessing genotypic adaptability and stability, characterizing and identifying target breeding environments, identifying tradeoffs among traits for such environments, and making predictions of the likely breeding value of the genotypes. Crop modeling science within the CGIAR has contributed to all of these. However, much progress remains to be done if modeling is to effectively contribute to more targeted and impactful breeding programs under changing climates. In a period in which CGIAR breeding programs are undergoing a major modernization process, crop modelers will need to be part of crop improvement teams, with a common understanding of breeding pipelines and model capabilities and limitations, and common data standards and protocols, to ensure they follow and deliver according to clearly defined breeding products. This will, in turn, enable more rapid and better-targeted crop modeling activities, thus directly contributing to accelerated and more impactful breeding efforts.Item CGIAR modeling approaches for resource-constrained scenarios: II. Models for analyzing socioeconomic factors to improve policy recommendations(Journal Article, 2020-05-01) Kruseman, Gideon K.; Bairagi, Subir; Komarek, Adam M.; Molero Milan, Anabel; Nedumaran, Swamikannu; Petsakos, Athanasios; Prager, Steven D.; Yigezu, Yigezu AtnafeInternational crop-related research as conducted by the CGIAR uses crop modeling for a variety of purposes. By linking crop models with economic models and approaches, crop model outputs can be effectively used as inputs into socioeconomic modeling efforts for priority setting and policy advice using ex-ante impact assessment of technologies and scenario analysis. This requires interdisciplinary collaboration and very often collaboration across a variety of research organizations. This study highlights the key topics, purposes, and approaches of socioeconomic analysis within the CGIAR related to cropping systems. Although each CGIAR center has a different mission, all CGIAR centers share a common strategy of striving toward a world free of hunger, poverty, and environmental degradation. This means research is mostly focused toward resource-constrained smallholder farmers. The review covers global modeling efforts using the IMPACT model to farm household bio-economic models for assessing the potential impact of new technologies on farming systems and livelihoods. Although the CGIAR addresses all aspects of food systems, the focus of this review is on crop commodities and the economic analysis linked to crop-growth model results. This study, while not a comprehensive review, provides insights into the richness of the socioeconomic modeling endeavors within the CGIAR. The study highlights the need for interdisciplinary approaches to address the challenges this type of modeling faces.Item A suite of global accessibility indicators(Journal Article, 2019) Nelson, Andy; Weiss, Daniel J.; Etten, Jacob van; Cattaneo, Andrea; McMenomy, Teresa S.; Koo, JawooGood access to resources and opportunities is essential for sustainable development. Improving access, especially in rural areas, requires useful measures of current access to the locations where these resources and opportunities are found. Recent work has developed a global map of travel times to cities with more than 50,000 people in the year 2015. However, the provision of resources and opportunities will differ across the broad spectrum of settlements that range from small towns to megacities, and access to this spectrum of settlement sizes should also be measured. Here we present a suite of nine global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution, for a range of settlement size classes. We validated the travel-time estimates against journey times from a Google driving directions application across 1,511 2° × 2° tiles representing 47,812 journeys. We observed very good agreement, though our estimates were more frequently shorter than those from the Google application with a median difference of −13.7 minutes and a median percentage difference of −16.9%.Item A scalable scheme to implement data-driven agriculture for small-scale farmers(Journal Article, 2019-12) Jiménez, Daniel; Delerce, Sylvain Jean; Dorado, Hugo Andres; Cock, James H.; Muñoz, Luis Armando; Agamez, Alejandro; Jarvis, AndyThe Colombian Ministry of Agriculture Colombia, an international research center and a national farmers’ organization developed a data-driven agricultural program that: (i) compiles information from multiple sources; (ii) interprets that data; and (iii) presents the knowledge to farmers through the local advisory services. Data was collected from multiple sources, including small-scale farmers. Machine learning algorithms combined with expert opinion defined how variation in weather, soils and management practices interact and affect maize yield of small-scale farmers. This knowledge was then used to provide guidelines on management practices likely to produce high, stable yields. The effectiveness of the practices was confirmed in on-farm trials. The principles established can be applied to rainfed crops produced by small-scale farmers to better manage their crops with less risk of failure.Item Applying FAIR Principles to plant phenotypic data management in GnpIS(Journal Article, 2019-01) Pommier, C.; Michotey, Célia; Cornut, Guillaume; Roumet, P.; Duchêne, E.; Flores, R.; Lebreton, A.; Alaux, M.; Durand, S.; Kimmel, E.; Letellier, T.; Merceron, G.; Laine, M.; Guerche, C.; Loaec, M.; Steinbach, D.; Laporte, Marie-Angélique; Arnaud, Elizabeth; Quesneville, H.; Adam-Blondon, Anne-FrançoiseGnpIS is a data repository for plant phenomics that stores whole field and greenhouse experimental data including environment measures. It allows long-term access to datasets following the FAIR principles: Findable, Accessible, Interoperable, and Reusable, by using a flexible and original approach. It is based on a generic and ontology driven data model and an innovative software architecture that uncouples data integration, storage, and querying. It takes advantage of international standards including the Crop Ontology, MIAPPE, and the Breeding API. GnpIS allows handling data for a wide range of species and experiment types, including multiannual perennial plants experimental network or annual plant trials with either raw data, i.e., direct measures, or computed traits. It also ensures the integration and the interoperability among phenotyping datasets and with genotyping data. This is achieved through a careful curation and annotation of the key resources conducted in close collaboration with the communities providing data. Our repository follows the Open Science data publication principles by ensuring citability of each dataset. Finally, GnpIS compliance with international standards enables its interoperability with other data repositories hence allowing data links between phenotype and other data types. GnpIS can therefore contribute to emerging international federations of information systems.Item AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture(Journal Article, 2018-01-01) Harper, L.; Campbell, J.; Cannon, Ethalinda K.S.; Jung, S.; Poelchau, M.; Walls, R.; Andorf, C.; Arnaud, Elizabeth; Berardini, T.Z.; Birkett, Clay L.; Cannon, S.; Carson, J.; Condon, B.; Cooper, Laurel D.; Dunn, N.; Elsik, C.G.; Farmer, A; Ficklin, S.P.; Grant, D.; Grau, E.; Herndon, N.; Hu, Z.L.; Humann, J.; Jaiswal, P.; Jonquet, C.; Laporte, Marie-Angélique; Larmande, Pierre; Lazo, G.; McCarthy, F.; Menda, N.; Mungall, C.J.; Muñoz Torres, Mónica Cecilia; Naithani, S.; Nelson, R.; Nesdill, D.; Park, C.; Reecy, J.; Reiser, L.; Sanderson, Lacey-Anne; Sen, T.Z.; Staton, M.; Subramaniam, S.; Tello-Ruiz, M.K.; Unda, V.; Unni, D.; Wang, L.; Ware, D.; Wegrzyn, J.; Williams, J.; Woodhouse, M.T.; Yu, J.; Main, D.The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.Item Role of Modelling in International Crop Research: Overview and Some Case Studies(Journal Article, 2018) Reynolds, Matthew P.; Kropff, Martin; Crossa, José; Koo, Jawoo; Kruseman, Gideon K.; Molero Milan, Anabel; Rutkoski, Jessica; Schulthess, Urs C.; Balwinder-Singh, Poonia, S.; Sonder, Kai; Tonnang, Henri E.Z.; Vadez, VincentCrop modelling has the potential to contribute to global food and nutrition security. This paper briefly examines the history of crop modelling by international crop research centres of the CGIAR (formerly Consultative Group on International Agricultural Research but now known simply as CGIAR), whose primary focus is on less developed countries. Basic principles of crop modelling building up to a Genotype × Environment × Management × Socioeconomic (G × E × M × S) paradigm, are explained. Modelling has contributed to better understanding of crop performance and yield gaps, better prediction of pest and insect outbreaks, and improving the efficiency of crop management including irrigation systems and optimization of planting dates. New developments include, for example, use of remote sensed data and mobile phone technology linked to crop management decision support models, data sharing in the new era of big data, and the use of genomic selection and crop simulation models linked to environmental data to help make crop breeding decisions. Socio-economic applications include foresight analysis of agricultural systems under global change scenarios, and the consequences of potential food system shocks are also described. These approaches are discussed in this paper which also calls for closer collaboration among disciplines in order to better serve the crop research and development communities by providing model based recommendations ranging from policy development at the level of governmental agencies to direct crop management support for resource poor farmers.Item Climate change impact on Mexico wheat production(Journal Article, 2018-12) Hernández Ochoa, Ixchel M.; Asseng, Senthold; Kassie, Belay T.; Xiong, Wei; Robertson, Ricky; Pequeno, Diego Notelo Luz; Sonder, Kai; Reynolds, Matthew P.; Babar, Md Ali; Molero Milan, Anabel; Hoogenboom, GerritItem AgroPortal: a vocabulary and ontology repository for agronomy(Journal Article, 2018-01) Jonquet, C.; Toulet, A.; Arnaud, Elizabeth; Aubin, S.; Dzale-Yeumo, E.; Emonet, V.; Graybeal, J.; Laporte, Marie-Angélique; Musen, M.A.; Pesce, V.; Larmande, PierreMany vocabularies and ontologies are produced to represent and annotate agronomic data. However, those ontologies are spread out, in different formats, of different size, with different structures and from overlapping domains. Therefore, there is need for a common platform to receive and host them, align them, and enabling their use in agro-informatics applications. By reusing the National Center for Biomedical Ontologies (NCBO) BioPortal technology, we have designed AgroPortal, an ontology repository for the agronomy domain. The AgroPortal project re-uses the biomedical domain’s semantic tools and insights to serve agronomy, but also food, plant, and biodiversity sciences. We offer a portal that features ontology hosting, search, versioning, visualization, comment, and recommendation; enables semantic annotation; stores and exploits ontology alignments; and enables interoperation with the semantic web. The AgroPortal specifically satisfies requirements of the agronomy community in terms of ontology formats (e.g., SKOS vocabularies and trait dictionaries) and supported features (offering detailed metadata and advanced annotation capabilities). In this paper, we present our platform’s content and features, including the additions to the original technology, as well as preliminary outputs of five driving agronomic use cases that participated in the design and orientation of the project to anchor it in the community. By building on the experience and existing technology acquired from the biomedical domain, we can present in AgroPortal a robust and feature-rich repository of great value for the agronomic domain. KeywordsItem Developing data interoperability using standards: A wheat community use case(Journal Article, 2017) Dzale-Yeumo, E.; Alaux, M.; Arnaud, Elizabeth; Aubin, S.; Baumann, U.; Buche, P.; Cooper, Laurel D.; Cwiek-Kupczynska, H.; Davey, R.P.; Fulss, R.A.; Jonquet, C.; Laporte, Marie-Angélique; Larmande, Pierre; Pommier, C.; Protonotarios, V.; Reverte, C.; Shrestha, R.; Subirats, I.; Venkatesan, A.; Whan, A.; Quesneville, H.In this article, we present a joint effort of the wheat research community, along with data and ontology experts, to develop wheat data interoperability guidelines. Interoperability is the ability of two or more systems and devices to cooperate and exchange data, and interpret that shared information. Interoperability is a growing concern to the wheat scientific community, and agriculture in general, as the need to interpret the deluge of data obtained through high-throughput technologies grows. Agreeing on common data formats, metadata, and vocabulary standards is an important step to obtain the required data interoperability level in order to add value by encouraging data sharing, and subsequently facilitate the extraction of new information from existing and new datasets. During a period of more than 18 months, the RDA Wheat Data Interoperability Working Group (WDI-WG) surveyed the wheat research community about the use of data standards, then discussed and selected a set of recommendations based on consensual criteria. The recommendations promote standards for data types identified by the wheat research community as the most important for the coming years: nucleotide sequence variants, genome annotations, phenotypes, germplasm data, gene expression experiments, and physical maps. For each of these data types, the guidelines recommend best practices in terms of use of data formats, metadata standards and ontologies. In addition to the best practices, the guidelines provide examples of tools and implementations that are likely to facilitate the adoption of the recommendations. To maximize the adoption of the recommendations, the WDI-WG used a community-driven approach that involved the wheat research community from the start, took into account their needs and practices, and provided them with a framework to keep the recommendations up to date. We also report this approach’s potential to be generalizable to other (agricultural) domains.Item The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics(Journal Article, 2018-01-04) Cooper, Laurel D.; Meier, A.; Laporte, Marie-Angélique; Elser, J.L.; Mungall, C.; Sinn, B.T.; Cavaliere, D.; Carbon, S.; Dunn, N.A.; Smith, B.; Qu, B.; Preece, J.; Zhang, E.; Todorovic, S.; Gkouto, G.; Doonan, J.H.; Stevenson, D.W.; Arnaud, Elizabeth; Jaiswal, P.The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository.Item Open Access and Open Data at CGIAR: challenges and solutions(Journal Article, 2017) Devare, Medha; Zandstra, Megan; Clobridge, Abby; Fotsy, Michelle; Abreu, David; Arnaud, Elizabeth; Baraka, Paul; Bonaiuti, Enrico; Chukka, Srinivasa Rao; Dieng, Ibnou; Dreher, Kate; Erlita, Sufiet; Juarez, H.; Kim, Soonho; Koo, Jawoo; Muchlish, Usman; Müller, Martin; Mwanzia, Leroy; Poole, Elizabeth J.; Siddiqui, SalmanCGIAR is a global research partnership of 15 geographically and scientifically diverse Centers dedicated to reducing poverty, enhancing food and nutrition security, and improving natural resource management. The Centers are charged with accelerating innovation to tackle challenges at a variety of scales from the local to the global. This requires data and other research outputs to be findable, accessible, interoperable, and reusable – that is, open via FAIR principles, and inter-linked where relevant. CGIAR Centers have made strong progress in implementing publication and data repositories; however, many of these still represent silos whose contents are not generally easily discoverable or inter-linked (e.g., agronomic trial data with socioeconomic or adoption data in the same geographies). In the absence of such interoperability-mediated discovery, “open” is of limited utility. The overall goal is for CGIAR’s trove of research data and associated information to be indexed and interlinked through a demand-driven cyberinfrastructure for agriculture, ensuring that research outputs are discoverable by humans and machines, and reusable via appropriate licensing to enhance innovation, uptake and impact. There are challenges to achieving this goal, not only across CGIAR, but for the agricultural domain in general. Among the foremost hurdles is that “open” tends to remain an unfunded mandate, making it difficult to operationalize effectively. Further, there is still significant concern on the part of scientists about making data open – largely centered around issues of trust, time, and quality – resulting in repositories frequently exposing metadata rather than the data sets themselves. While the ability to find metadata about resources qualifies as improvement, it continues to impose barriers to data access, discoverability, integration, and analysis, without which complex challenges to global agriculture development cannot be effectively addressed. CGIAR is addressing the urgent need to create a data sharing culture and enabling environment for Open Access and Open Data (OA/OD) that includes projects planning for OA/OD and allocating funds to support it, in parallel with the technical infrastructure mentioned above. While the technology necessary to enable FAIR outputs exists, achieving success implies data provider and consumer trust and buy-in, agreement and adherence to interoperability standards and/or mapping across varied approaches, and compliance with guidelines (including those on citation and licensing governing content reuse). Agricultural institutions, including CGIAR, are only now beginning to address these issues systematically, to agree on and adopt standards-based systems and processes, and to build cross-walks across differing schemas. Through its Open Access and Open Data initiative funded by the Bill and Melinda Gates Foundation, and via plans for an ambitious Big Data and ICT Platform , CGIAR is developing technical and cultural approaches that will enable research content to be consistently and seamlessly discovered, interlinked, and analyzed across its Centers. This paper describes the strategy used to identify the specific contexts and challenges faced by Centers in building an infrastructure and culture for OA/OD across CGIAR, with the ultimate goal of achieving greater impact in agricultural research for development.Item Data management and best practice for plant science(Journal Article, 2017) Leonelli, S.; Davey, R.P.; Arnaud, Elizabeth; Parry, G.; Bastow, R.