CGIAR BigData Conference Papers
Permanent URI for this collectionhttps://hdl.handle.net/10568/89426
Browse
Recent Submissions
Item A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants(Conference Paper, 2018) Endara L.; Burleigh G.; Cooper L.; Jaiswal, P.; Laporte, Marie-AngéliqueAssembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer.org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future.Item OOPS: The Ontology of Plant Stress: A semi-automated standardization methodology(Conference Paper, 2018) Meier A.; Laporte, Marie-Angélique; Elser J.; Cooper L.; Preece, J.; Jaiswal, P.; Poolen, J.Plant stress traits are important breeding targets for all crop species. Massive amounts of research dollars are spent generating data to combat plant diseases and environmental stress. Often this data is used to achieve a single goal, and then left in a repository to never be used again. As a scientific community, we should be striving to make all publicly funded data reusable, and interoperable. This goal is achievable only through careful annotation using universal data and metadata standards. One such standard is the use of a standardized vocabulary, or ontology. This paper presents a semi-automated method to define and label plant stresses using a combination of web scraping and ontology design patterns. Standardizing the definitions and linking plant stress with established hierarchies leverages previous work of developed knowledge bases such as taxonomic classifications and other ontologies.