Standardize format of traitdata — as.traitdata • traitdataform

Turns wide-table formats (species-traits matrix and occurrence table) into long-table format. As input, the function requires information about which columns contain traits, given as a list of trait-names, and which column contains the taxon name. For tables containing repeated measurements of traits within the same taxon, an occurrenceID should be given or will be created.

Usage

as.traitdata(
  x,
  traits = attributes(x)$traits,
  taxa = attributes(x)$taxa,
  occurrences = attributes(x)$occurrences,
  datasetID = attributes(x)$datasetID,
  measurements = attributes(x)$measurements,
  units = attributes(x)$units,
  keep = attributes(x)$keep,
  drop = attributes(x)$drop,
  na.rm = TRUE,
  id.vars = names(x)[names(x) %in% keep & !names(x) %in% drop],
  thesaurus = attributes(x)$thesaurus,
  metadata = attributes(x)$metadata,
  longtable = TRUE,
  conformsTo = "v0.10",
  ...
)

Arguments

x: data.frame object, containing at least a column of taxa, and one or more columns of trait measurements.
traits: a vector of column names containing traits.
taxa: the name of the column containing taxon names.
occurrences: either a column name containing identifiers for each individual specimen on which several traits were measured, i.e. an occurrence of this taxon, or a vector of occurrence identifiers which must be of the same length as the number of rows of the table. See 'Details'.
datasetID: a unique name for this dataset (optional). Will be prepended to the occurrence ID and measurement ID.
measurements: either a column name containing identifiers for each individual measurement, or a vector of measurement identifiers. This applies, if single trait measurements span across multiple columns of data, e.g. multivariate traits like quantitative measures of chemical compounds, wavelengths or x-y-z coordinates. In most cases, a measurementID will link the data across rows in the longtable format. Make sure that the traitnames given reflect the different dimensions of the trait measurement. If measurement remains blank, sequential identifiers will be auto-generated for each measured value.
units: a single character string or named vector giving the units that apply to the traits. If only one unit type is given, it will be applied to all traits.
keep: a vector or named vector containing the names of the input columns to be kept in the output. Vector names will be used to rename the columns. It is recommended to use accepted column names of the traitdata standard for renaming!
drop: a vector acting as the inverse of keep. All columns listed will be removed from the output dataset.
na.rm: logical defaults to TRUE. If FALSE, all measured Values containing NA will be kept in the output table. This is not reccomended for most data.
id.vars: a vector of column names to return. Autogenerated from input column names and 'keep' and 'drop'.
thesaurus: an object of class 'thesaurus' as created by function as.thesaurus(). If provided, this will superimpose trait names provided in argument traits. The thesaurus will be appended as an attribute and can be revisited by calling attributes(x)$thesaurus.
metadata: a list of class metadata, as created by function as.metadata(). Metadata will be added as attributes to the data table. Possible parameters to the function call are: rightsHolder, bibliographicCitation, license, author, datasetID, datasetName, version. (see 'Details')
longtable: logical, defaults to TRUE. If FALSE, data will not be converted into lontable format, but remain in widetable format as provided. Note that any columns not indicated in arguments traits, keep, units, taxa, occurrences will be dropped from the output.
conformsTo: version of the Ecological Trait-data Standard to which the data conform. Default procedures return data conform to v0.10. If conformsTo = "v0.9", data output will be converted to Ecological Trait-data Standard v0.9.
...: other arguments, passed on to print function.

Value

An object of class 'traitdata'.

Details

If occurrences is left blank, the script will check for the structure of the input table. If several entries are given for the same taxon, it assumes that input is an occurrence table, i.e. with multiple observations of a single taxon, and assigns identifiers.

Metadata will be stored as attributes to the data frame and can be accessed via attributes(). It is not necessary but highly recommended to provide metadata when working with multiple trait data files. When appending datasets using rbind(), the metadata information will be added as additional columns and dataset attribution will be listed in attributes.

Examples


if (FALSE) {
# species-trait matrix:

pulldata("carabids")

dataset1 <- as.traitdata(carabids,
  taxa = "name_correct",
  traits = c("body_length", "antenna_length", "metafemur_length"),
  units = "mm",
  keep = c(basisOfRecordDecription = "source_measurement", measurementRemark = "note")
  )

# occurrence table:

pulldata("heteroptera_raw")

dataset2 <- as.traitdata(heteroptera_raw,
  taxa = "SpeciesID",
  traits = c("Body_length", "Body_width", "Body_height", "Thorax_length",
    "Thorax_width", "Head_width", "Eye_width", "Antenna_Seg1", "Antenna_Seg2",
    "Antenna_Seg3", "Antenna_Seg4", "Antenna_Seg5", "Front.Tibia_length",
    "Mid.Tibia_length", "Hind.Tibia_length", "Front.Femur_length",
    "Hind.Femur_length", "Front.Femur_width", "Hind.Femur_width",
    "Rostrum_length", "Rostrum_width", "Wing_length", "Wing_widt"),
  units = "mm",
  keep = c(sex = "Sex", references = "Source", lifestage = "Wing_development"),
  metadata = as.metadata(
    author = "Gossner MM, Simons NK, Höck L and Weisser WW",
    datasetName = "Morphometric traits Heteroptera",
    bibliographicCitation = attributes(heteroptera_raw)$citeAs,
    license = "http://creativecommons.org/publicdomain/zero/1.0/"
    )
)
}