A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants

cg.authorship.typesCGIAR and advanced research instituteen
cg.contributor.affiliationUniversity of Floridaen
cg.contributor.affiliationOregon State Universityen
cg.contributor.affiliationBioversity Internationalen
cg.contributor.affiliationUniversity of Arizonaen
cg.contributor.crpBig Data
cg.creator.identifierMarie-Angélique Laporte: 0000-0002-8461-9745en
cg.identifier.urlhttp://ceur-ws.org/Vol-2285/ICBO_2018_paper_50.pdfen
cg.issn1613-0073en
cg.reviewStatusPeer Reviewen
dc.contributor.authorEndara L.en
dc.contributor.authorBurleigh G.en
dc.contributor.authorCooper L.en
dc.contributor.authorJaiswal, P.en
dc.contributor.authorLaporte, Marie-Angéliqueen
dc.date.accessioned2019-04-16T14:00:31Zen
dc.date.available2019-04-16T14:00:31Zen
dc.identifier.urihttps://hdl.handle.net/10568/100813
dc.titleA Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plantsen
dcterms.abstractAssembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer.org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future.en
dcterms.accessRightsOpen Access
dcterms.audienceScientistsen
dcterms.bibliographicCitationEndara L.; Burleigh G.; Cooper L.; Jaiswal P.; Laporte M-A.; Cui H. (2018) A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants. In: Jaiswal P.; Cooper, L.; Haendel, M.A.; Mungall, C.J. (eds.) International Conference on Biological Ontology (ICBO 2018), Proceedings of the 9th International Conference on Biological Ontology, Corvallis, Oregon, USA, August 7-10, 2018, 4 p. ISSN: 1613-0073en
dcterms.extent4 p.en
dcterms.issued2018en
dcterms.languageen
dcterms.licenseCC0-1.0
dcterms.subjectdata processingen
dcterms.subjectontologyen
dcterms.subjecttaxonomyen
dcterms.subjectmastigophoraen
dcterms.subjectphenotypesen
dcterms.typeConference Paper

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A natural_Endara_2018.pdf
Size:
2.76 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: