Response data

Response data #

The response data include the observations we are trying to model, as well as columns with identifiers (indices, IDs, datetime strings) for all relevant sampling design information. Elements of the sampling design often seen in long-term monitoring data include the following:

Design elementExamples
observational unitstransect, quadrat, plot
sampling unitsplot, site
stratificationstratum
date / event timesMM/DD/YYYY, YYYY

Format #

The response data are stored as flat files. The key characteristic of a flat file is that each row represents a single observation, while the columns describe values associated with the observation and the design features described above. These files are typically text files with no special word processing or markup. The file can be CSV, XLS, XLSX, GZ, or RDS. For ease of use, readability, and other reasons, we generally recommend CSV.

Example #

The response data below contain species richness observations for forb (native.forb.rich) and grass-like (native.gram.rich) species from Little Bighorn Battlefield National Monument (LIBI), in Montana. Here, we see the first and last six rows of the data.

RowParkMDCATYSiteNameYearTransectPlotnative.forb.richnative.gram.rich
1LIBIGulley1LIBI_00120111A65
2LIBIGulley1LIBI_00120111B55
3LIBIGulley1LIBI_00120111C42
4LIBIGulley1LIBI_00120111D68
5LIBIGulley1LIBI_00120112A67
6LIBIGulley1LIBI_00120112B46
1105LIBIUplandLIBI_05020192A52
1106LIBIUplandLIBI_05020192B46
1107LIBIUplandLIBI_05020192C33
1108LIBIUplandLIBI_05020193A21
1109LIBIUplandLIBI_05020193B21
1110LIBIUplandLIBI_05020193C32

Although they may go by different names, we see many of the design elements appearing in columns. Our sampling units are sites (SiteName) within strata (MDCATY). Individual observations are indexed by the unique combinations of Transect and Plot within each site. All of the sites in this dataset come from a single park unit (LIBI). The calendar year in which the observations were made is given in the column Year.

From data to model #

We’ll see how to declare various aspects of the response information in the data block of the analysis config files in another section of this guide. For now, we will leave things as they stand, with two quick notes / caveats:

  • Unlike many design based approaches, which aggregate observations within or across sampling units (using a mean, for instance), we work with the raw observations themselves. As we begin to develop models for the data, it will be important to know what type of data your observations represent.
  • If you are expecting to use covariates in your model, the names of each design element must be consistent across response and covariate data files. The reason for this requirement is that the analysis pipeline performs auto joins. If the column containing site information in the response data is called SiteName, but Site in the covariate data, the program won’t know they’re intended to be the same. Additionally, even the entries within a column must be the same. Thus, if a site is called LIBI_001 in the response info it must be given the same name in the covariate info. If, in the covariate info, the sites appear without the prefix for unit code (e.g., 001), the join will fail.