A Comprehensive Database of CGIAR Climate-Related Journal Articles (2012–2023)
Date Issued
Date Online
Language
Type
Review Status
Access Rights
Metadata
Full item pageCitation
Alan Orth, Caroline K. Bosire, Laura Rabago, Shrijana Vaidya, Sitashma Rajbhandari, Prajal Pradhan, Aditi Mukherji. (5/12/2024). A Comprehensive Database of CGIAR Climate-Related Journal Articles (2012–2023) [Bibliographic metadata].
Permanent link to cite or share this item
External link to download this item
DOI
Abstract/Description
This dataset contains bibliographic metadata for 3,466 peer-reviewed journal articles used in the 2024 synthesis of CGIAR work on climate change. The metadata was retrieved from eight CGIAR institutional repositories, processed using a Python-based extract, transform, and load (ETL) pipeline, and screened for climate change relevance in Rayyan.
Through harvesting we identified 5,487 journal articles matching the inclusion criteria in CGIAR repositories:
- Issue date between 2012 and 2023
- The words "climate change" in the title, abstract, or keywords
- English language
- DOI assigned
The bibliographic metadata was merged and normalized to ensure consistent use of date formats, multi-value separators, and identifiers. The ETL pipeline used titles and DOIs to identify and remove duplicates, as well as exclude any others that had been erroneously included due to incorrect repository metadata we could identify (mislabeled preprints, non-English, etc.). We used Crossref, Unpaywall, and OpenAlex to fill in gaps for missing metadata such as usage (license) and access rights, affiliations, and publishers because this information can be valuable to researchers. Minor normalization was performed on affiliations, countries, and publishers, but all other metadata was used as-is from the respective repositories.
4,495 journal articles were uploaded to the Rayyan platform for a blinded screening of climate change relevance by a team trained in systematic literature review methodology. Reviewers excluded journal articles not deemed to be climate change related or identified as further duplicates.
This dataset is useful for understanding CGIAR’s research on climate change. Potential areas of work could be to use machine learning to classify thematic areas.
The Python code used to perform the harvesting and processing of this dataset can be found on GitHub: https://github.com/ilri/cgiar-climate-change-synthesis
Author ORCID identifiers
Aditi Mukherji https://orcid.org/0000-0002-8061-4349