Statistics Toolbox
Format of Data Files
General Rules
- Tab-delimited text file
- One observation per row (multiple fields in a row)
- Observations are uninterrupted, i.e., no carriage returns within an observation
- Avoid special characters, e.g., ‘, “, #, &
- Most times it is best to replace special characters with a ‘.’
- Nondetects can be identified by either:
- Coding nondetects as negative values in the result column
- Having a detect column in the data where ‘T’ corresponds to a detect and ‘F’ corresponds to a nondetects
- Replace empty cells with ‘NA’ (see examples below)
Examples
The 3 data sets summarized below are included in the Zip file.
1. Layout of a typical environmental site characterization data set. In the analysis, it might desirable to subset the data using the factors ‘analyte’ and ‘siteid’ – StatWiz provides an interface to subset the data ready for analysis. If there are spatial coordinates then some spatial presentations are possible in StatWiz. Summary statistics, exploratory data analysis (plots) hypothesis tests and confidence intervals are also possible. IF there is a detect flag then that can be factored into the analysis in StatWiz.
|
siteid
|
analyte
|
concentration
|
detectflag
|
x.coord
|
y.coord
|
|---|---|---|---|---|---|
|
BKG
|
Arsenic
|
21.6
|
T
|
418550.9
|
3891761
|
|
BKG
|
Arsenic
|
34.2
|
T
|
NA
|
NA
|
|
BKG
|
Copper
|
30.7
|
T
|
418882.1
|
3894504
|
|
BKG
|
Copper
|
48.5
|
T
|
NA
|
NA
|
|
BKG
|
Lead
|
7
|
T
|
418550.9
|
3891761
|
|
BKG
|
Lead
|
0.15
|
F
|
418550.9
|
3891761
|
|
ND02
|
Arsenic
|
14.9
|
T
|
421301.17
|
3892240.68
|
|
ND02
|
Arsenic
|
6.6
|
T
|
421301.17
|
3892240.68
|
|
ND02
|
Copper
|
19.3
|
T
|
421301.17
|
3892240.68
|
|
ND02
|
Copper
|
12.9
|
T
|
421301.17
|
3892240.68
|
|
ND02
|
Lead
|
15
|
T
|
421301.17
|
3892240.68
|
|
ND02
|
Lead
|
1.1
|
T
|
421301.17
|
3892240.68
|
|
ND11B
|
Arsenic
|
0.3
|
F
|
423498.61
|
3897576.27
|
|
ND11B
|
Arsenic
|
1.7
|
T
|
423488.98
|
3897581.71
|
|
ND11B
|
Copper
|
2.5
|
F
|
423381.78
|
3897488.5
|
|
ND11B
|
Copper
|
9.41
|
T
|
423461
|
3897557
|
|
ND11B
|
Lead
|
0.1
|
F
|
423443.83
|
3897578.28
|
|
ND11B
|
Lead
|
26
|
T
|
423443.83
|
3897578.28
|
2. Layout of a data set for linear modeling. Note that these data are essentially laid out the same way as the previous set. That is there are continuous responses of interest (similar to concentration). The linear modeling program provides an interface for selecting the dependent and independent variable.
|
species
|
name
|
dispersaldistance
|
homerange
|
homerangeSqrt
|
|---|---|---|---|---|
|
Canis latrans
|
coyote
|
29400
|
75980000
|
8716.65
|
|
Clethrionomys gapperi
|
vole
|
220
|
2500
|
50
|
|
Ochotona princeps
|
pika
|
90
|
3500
|
59.16
|
|
Odocoileus hemionus
|
mule deer
|
3000
|
590000
|
768.11
|
|
Odocoileus virginianus
|
deer
|
15000
|
1960000
|
1400
|
|
Peromyscus maniculatus
|
mouse
|
100
|
729
|
27
|
|
Tamiasciurus hudsonicus
|
squirrel
|
362.5
|
11000
|
104.88
|
|
Vulpes vulpes
|
fox
|
10800
|
10374841
|
3221
|
3. Layout of an ecotox example. The fathead minnow data set below is an example of the layout of an ANOVA data set for statistical analyses of some ecotox data. The ecotox program for this example uses concentration as a factor or predictor, and mortality as a dependent variable (response); n is the count that constraints the range of the mortality data. This layout is different than the other two only because of the way n is used.
|
concentration
|
mortality
|
n
|
replicate
|
|---|---|---|---|
|
0
|
0
|
10
|
A
|
|
32
|
2
|
10
|
A
|
|
64
|
1
|
10
|
A
|
|
128
|
1
|
10
|
A
|
|
256
|
3
|
10
|
A
|
|
512
|
6
|
10
|
A
|
|
0
|
0
|
10
|
B
|
|
32
|
2
|
10
|
B
|
|
64
|
0
|
10
|
B
|
|
128
|
1
|
10
|
B
|
|
256
|
1
|
10
|
B
|
|
512
|
7
|
10
|
B
|
|
0
|
1
|
10
|
C
|
|
32
|
0
|
10
|
C
|
|
64
|
0
|
10
|
C
|
|
128
|
2
|
10
|
C
|
|
256
|
0
|
10
|
C
|
|
512
|
6
|
10
|
C
|
|
0
|
1
|
10
|
D
|
|
32
|
2
|
10
|
D
|
|
64
|
0
|
10
|
D
|
|
128
|
0
|
10
|
D
|
|
256
|
5
|
10
|
D
|
|
512
|
8
|
10
|
D
|
