Trait thesauri and ontologies

If no published trait definition is available that can be referenced, trait-datasets should be accompanied by a dataset-specific glossary of traits, also termed a ‘thesaurus’. In its simplest form, a trait thesaurus would provide a “controlled vocabulary designed to clarify the definition and structuring of key terms and associated concepts in a specific discipline” (Laporte et al. 2013; Garnier et al. 2017). To be unambiguous, any thesaurus should be defining terms based on other well-defined terms from semantic ontologies.

In addition to the mere listing of trait definitions, a trait ‘ontology’ would be providing a formal model of the conceptual objects and relationships, or entities and qualities (Garnier et al. 2017), ideally structured following the guidelines for a semantic web standard (Berners-Lee et al. 2001). This can take rather complex forms, but for trait data might be just adding a hierarchy of broader and narrower terms.

By providing a minimal vocabulary for thesauri and ontologies (https://ecologicaltraitdata.github.io/TraitDataStandard/#terms-for-traitlists-a-trait-thesaurus), we hope to facilitate the publication of trait thesauri developed for the own project context which always should be referenced in the core trait dataset. The thesaurus might accompany the core data in a Darwin Core Archive, or be published on any other stable webservice.

Thus, in its simplest form, a trait thesaurus would assign trait names with A) an unambiguous definition of the trait and B) an expected format (e.g. units or legit factor levels) of measured values or reported facts. A trait ontology would additionally provide semantic relationships between terms for deriving a hierarchical or tree-based classification of traits and analysing traits in a broader taxonomic context.

Minimal terms of a trait thesaurus

A project-specific trait thesaurus may be a table of terms containing the following information:

a human readable, informative trait name (trait)
unique dataset-specific identifier (Identifier), which is referenced in the trait data-set
a short, unambiguous verbal definition (traitDescription) which may make use of standard terms provided in other Ontologies, e.g. the definition for ‘fruit mass’ in TOP reads: “the mass (PATO:mass), either fresh or dried, of a fruit (PO:fruit)”, referring to Phenotypic Characeristics Ontology PATO and Planteome Plant Ontolgy, PO (http://top-thesaurus.org/annotationInfo?viz=1&&trait=Fruit_mass).
constrain the legit factor levels (for categorical data, factorLevels) or expected standard units (expectedUnit for numerical data). The type of values should be differentiated in the field valueType by specifying ‘numerical’, ‘logical’, ‘integer’, ‘categorical’ traits.
link the term to a broader or narrower term (broaderTerm, narrowerTerm), related terms (relatedTerm) or synonyms (synonym), e.g. the definition of ‘femur length of first leg, left side’ is narrower than ‘femur length’ which is narrower than ‘leg trait’ which is narrower than ‘locomotion trait’. This extends the trait list into a semantic web resource, facilitates the classification of traits, and allows for cross-taxon comparative studies at the level of broader terms (Garnier et al. 2017).

defining expected values

Traits are not only defined in terms of their interpretation, but are ideally also standardised in terms of numerical units and, even more important, the use of factor levels. This is challenging, given the range of data types that fall within datasets of functional traits.

Numerical values represent measurements of lengths, volumes, ratios, rates or timespans. Integer values may apply to count data (e.g. eggs per clutch). Binary data (encoded as 0 or 1) or logical data (coded as TRUE or FALSE) may apply to qualitative traits such as specific behaviour during mating (e.g. are territories defended) or specialisation to a given habitat (e.g. species restricted to relicts of primeval forests). Many traits are categorical and allow for a constrained set of factor levels, such as sex differences in wing morphology (both sexes winged, both sexes unwinged, only males winged, only females winged) or unconstrained entries such as color. Categorical traits often are ordinal, i.e. they have a logical sequence as in the case of life stages or hibernation stages, or habitat preference traits such as horizontal stratum use. Finally, there are specific formats of multivariate trait values, e.g. x-y-z coordinates of a landmark measured in a standardised 3D space or relative abundance of chain-lengths in biochemical compounds. To cope with this variety of data types, definitions should refer to other well-defined terms of existing ontologies that describe standard units, morphological body parts, protein characteristics (Protein Ontology) or chemical terms (e.g. the ChEBI, http://www.obofoundry.org/ontology/chebi.html).

Such reference definitions should also refer to methodological handbooks (Perez-Harguindeguy et al. (2013); (???)), which standardise the process of measurement.

semantic web

Online ontologies extend into (machine readable) semantic web resources by providing a hierarchical classification of traits (TOP) or a relational tree of functional traits (e.g. TOP or T-SITA).

Each trait definition may link to a broader or narrower term. For instance, the definition of ‘femur length of first leg, left side’ is narrower than ‘femur length’ which is narrower than ‘leg trait’ which is narrower than ‘locomotion trait’. (Ref semantic database methods) This links traits of similar functional meaning.

These systematic approaches to traits will be very useful for comparing similar traits measured in different taxa at higher hierarchical levels. Furthermore, trait definitions may refer to related terms that describe a similar feature but with other means, or synonyms defined in other trait ontologies. By providing this semantic interlinkage of trait ontologies, a web of definitions is spun across the internet which allows researchers and search engines to relate independent trait measurements with each other.

The benefit of such classifications will increase if open Application Programming Interfaces (APIs) provide a way to extract the definitions and higher-level trait hierarchies programmatically via software tools. To harmonize trait data across databases, future trait standard initiatives should provide this functionality.

Publish trait thesauri or ontologies

The biggest challenge in building semantic web resources for scientific communities is building a consensus vocabulary that is sufficiently broad while being specific enough for all most use cases. Thesauri and ontologies may be published to cover the specific project context only (e.g. ecosystem, region or experiment), or to be of general use and application for an entire taxonomic group or methodology. The claim of generality comes with higher demands on consensus building.

That said, different models of publication of trait thesauri may apply depending on the claim of generality.

publish as supplementary information

If the thesaurus is only meant to translate the specific project dataset, the definition of traits may be published as metadata along with the trait dataset. This can be done by adding it to the Archive uploaded to a low-threshold fileserver. We highly recommend the use of Darwin Core Archives for this use case.

Publication on own website

If the thesaurus is of broader use, e.g. for a specific organism group, a region or ecosystem, or a broader project context, thesauri are rather defined by pragmatic choice than a claim of completeness. Consensus is acheived within a limited community of researchers. In this case, multiple indidual datasets refer to the same thesaurus. Therefore, it should be published as a simple, static web resource.

A simple way to publish a list of trait definitions for a project may be as a public repository on development platforms like Github.

Publication on Ontology servers

Online ontologies hosted with accredited ontology servers have the advantage of providing a persistent and direct link of the term on the internet (a Uniform Resource Identifier, URI). Terminology portals or registries, such as the GFBio Terminology Service, the OBO Foundry, or Ontobee, may provide a central host for trait ontologies.

References

Berners-Lee, T., J. Hendler, and O. Lassila. 2001. The semantic web. Scientific american 284:28–37.

Garnier, E., U. Stahl, M.-A. Laporte, J. Kattge, I. Mougenot, I. Kühn, B. Laporte, et al. 2017. Towards a thesaurus of plant characteristics: An ecological contribution. Journal of Ecology 105:298–309.

Laporte, M.-A., E. Garnier, and I. Mougenot. 2013. A faceted search system for facilitating discovery-driven scientific activities: A use case from functional ecology. Semantics for Biodiversity (S4BioDiv 2013) 25.

Perez-Harguindeguy, N., S. Diaz, E. Garnier, S. Lavorel, H. Poorter, P. Jaureguiberry, M. S. Bret-Harte, et al. 2013. New handbook for standardised measurement of plant functional traits worldwide. Australian Journal of botany 61:167–234.

Creating Trait Thesauri