Statistics Toolbox

Format of Data Files

General Rules
  • Tab-delimited text file
  • One observation per row (multiple fields in a row)
  • Observations are uninterrupted, i.e., no carriage returns within an observation
  • Avoid special characters, e.g., ‘, “, #, &
  • Most times it is best to replace special characters with a ‘.’
  • Nondetects can be identified by either:
  • Coding nondetects as negative values in the result column
  • Having a detect column in the data where ‘T’ corresponds to a detect and ‘F’ corresponds to a nondetects
  • Replace empty cells with ‘NA’ (see examples below)
Examples
The 3 data sets summarized below are included in the Zip file.
1. Layout of a typical environmental site characterization data set. In the analysis, it might desirable to subset the data using the factors ‘analyte’ and ‘siteid’ – StatWiz provides an interface to subset the data ready for analysis. If there are spatial coordinates then some spatial presentations are possible in StatWiz. Summary statistics, exploratory data analysis (plots) hypothesis tests and confidence intervals are also possible. IF there is a detect flag then that can be factored into the analysis in StatWiz.
siteid
analyte
concentration
detectflag
x.coord
y.coord
BKG
Arsenic
21.6
T
418550.9
3891761
BKG
Arsenic
34.2
T
NA
NA
BKG
Copper
30.7
T
418882.1
3894504
BKG
Copper
48.5
T
NA
NA
BKG
Lead
7
T
418550.9
3891761
BKG
Lead
0.15
F
418550.9
3891761
ND02
Arsenic
14.9
T
421301.17
3892240.68
ND02
Arsenic
6.6
T
421301.17
3892240.68
ND02
Copper
19.3
T
421301.17
3892240.68
ND02
Copper
12.9
T
421301.17
3892240.68
ND02
Lead
15
T
421301.17
3892240.68
ND02
Lead
1.1
T
421301.17
3892240.68
ND11B
Arsenic
0.3
F
423498.61
3897576.27
ND11B
Arsenic
1.7
T
423488.98
3897581.71
ND11B
Copper
2.5
F
423381.78
3897488.5
ND11B
Copper
9.41
T
423461
3897557
ND11B
Lead
0.1
F
423443.83
3897578.28
ND11B
Lead
26
T
423443.83
3897578.28
2. Layout of a data set for linear modeling. Note that these data are essentially laid out the same way as the previous set. That is there are continuous responses of interest (similar to concentration). The linear modeling program provides an interface for selecting the dependent and independent variable.
species
name
dispersaldistance
homerange
homerangeSqrt
Canis latrans
coyote
29400
75980000
8716.65
Clethrionomys gapperi
vole
220
2500
50
Ochotona princeps
pika
90
3500
59.16
Odocoileus hemionus
mule deer
3000
590000
768.11
Odocoileus virginianus
deer
15000
1960000
1400
Peromyscus maniculatus
mouse
100
729
27
Tamiasciurus hudsonicus
squirrel
362.5
11000
104.88
Vulpes vulpes
fox
10800
10374841
3221
3. Layout of an ecotox example. The fathead minnow data set below is an example of the layout of an ANOVA data set for statistical analyses of some ecotox data. The ecotox program for this example uses concentration as a factor or predictor, and mortality as a dependent variable (response); n is the count that constraints the range of the mortality data. This layout is different than the other two only because of the way n is used.
concentration
mortality
n
replicate
0
0
10
A
32
2
10
A
64
1
10
A
128
1
10
A
256
3
10
A
512
6
10
A
0
0
10
B
32
2
10
B
64
0
10
B
128
1
10
B
256
1
10
B
512
7
10
B
0
1
10
C
32
0
10
C
64
0
10
C
128
2
10
C
256
0
10
C
512
6
10
C
0
1
10
D
32
2
10
D
64
0
10
D
128
0
10
D
256
5
10
D
512
8
10
D