Response data #

The response data include the observations we are trying to model, as well as columns with identifiers (indices, IDs, datetime strings) for all relevant sampling design information. Elements of the sampling design often seen in long-term monitoring data include the following:

Design element	Examples
observational units	transect, quadrat, plot
sampling units	plot, site
stratification	stratum
date / event times	`MM/DD/YYYY`, `YYYY`

Format #

The response data are stored as flat files. The key characteristic of a flat file is that each row represents a single observation, while the columns describe values associated with the observation and the design features described above. These files are typically text files with no special word processing or markup. The file can be CSV, XLS, XLSX, GZ, or RDS. For ease of use, readability, and other reasons, we generally recommend CSV.

Example #

The response data below contain species richness observations for forb (native.forb.rich) and grass-like (native.gram.rich) species from Little Bighorn Battlefield National Monument (LIBI), in Montana. Here, we see the first and last six rows of the data.

Row	Park	MDCATY	SiteName	Year	Transect	Plot	native.forb.rich	native.gram.rich
1	LIBI	Gulley1	LIBI_001	2011	1	A	6	5
2	LIBI	Gulley1	LIBI_001	2011	1	B	5	5
3	LIBI	Gulley1	LIBI_001	2011	1	C	4	2
4	LIBI	Gulley1	LIBI_001	2011	1	D	6	8
5	LIBI	Gulley1	LIBI_001	2011	2	A	6	7
6	LIBI	Gulley1	LIBI_001	2011	2	B	4	6
…	…	…	…	…	…	…	…	…
1105	LIBI	Upland	LIBI_050	2019	2	A	5	2
1106	LIBI	Upland	LIBI_050	2019	2	B	4	6
1107	LIBI	Upland	LIBI_050	2019	2	C	3	3
1108	LIBI	Upland	LIBI_050	2019	3	A	2	1
1109	LIBI	Upland	LIBI_050	2019	3	B	2	1
1110	LIBI	Upland	LIBI_050	2019	3	C	3	2

Although they may go by different names, we see many of the design elements appearing in columns. Our sampling units are sites (SiteName) within strata (MDCATY). Individual observations are indexed by the unique combinations of Transect and Plot within each site. All of the sites in this dataset come from a single park unit (LIBI). The calendar year in which the observations were made is given in the column Year.

From data to model #

We’ll see how to declare various aspects of the response information in the data block of the analysis config files in another section of this guide. For now, we will leave things as they stand, with two quick notes / caveats:

Unlike many design based approaches, which aggregate observations within or across sampling units (using a mean, for instance), we work with the raw observations themselves. As we begin to develop models for the data, it will be important to know what type of data your observations represent.
If you are expecting to use covariates in your model, the names of each design element must be consistent across response and covariate data files. The reason for this requirement is that the analysis pipeline performs auto joins. If the column containing site information in the response data is called SiteName, but Site in the covariate data, the program won’t know they’re intended to be the same. Additionally, even the entries within a column must be the same. Thus, if a site is called LIBI_001 in the response info it must be given the same name in the covariate info. If, in the covariate info, the sites appear without the prefix for unit code (e.g., 001), the join will fail.