Legacy STORET Data

The original STORET Database operated continuously from
1963 until 1999 under the Public Health Service and under
the Environmental Protection Agency.  It is the largest
single collection of water quality monitoring data in the
world.

For effective distribution, these data have been placed
on a series of CD-ROMs, which are available from EPA.

Each CD-ROM covers one of EPA's administrative Regions,
numbered 01-10, with Canadian Monitoring included in
EPA Region 05, which has responsibility for much of the
US-Canada border waters, including the Great Lakes.

In the root directory of each CD you will find this
_Read_Me.txt file.  In addition, there is a file named 
National_Summary.xls (a spread sheet in Microsoft(R)
Excel(tm) format), which contains useful information
about the disk space requirements for extracting your 
STORET data from the compressed files.  A tab-delimited
text form of this file is provided as well, named 
National_Summary.txt.

Within each CD-ROM there are a series of Self-Extracting
"ZIP" files, one for each state, with names like 
"South_Dakota.exe".  Double-clicking on one of these
files will extract its data in uncompressed form.

Following extraction, each file will create a directory
or folder named for the state, like "c:\South_Dakota".  
Within each folder will be a text file containing a matrix
of observation counts, with a row for each Legacy STORET
Parameter Code ever observed in the state, and a column
for each of the state's counties, indicating the number 
of STORET observations of each parameter in each county.
This file will have a name like "South_Dakota_Matrix.txt".

In addition, each state folder will contain a series of
files, one for each county, with names like 
"SD_Fall_River_inv.txt" (Fall River County, South Dakota).
These files contain a summary of data within each county, 
indicating the total number of Legacy STORET stations 
within the county, and for each Legacy STORET Parameter 
Code, how many observations exist within the county, 
what overall range of calendar dates are covered, 
and where possible, the minimum, maximum, and average 
of all reported values.  This report corresponds to the
"PGM=INVENT" data summary which was supported on the
Legacy STORET Mainframe system.

Lastly, each state folder will contain a series of
subfolders, one for each county, with names like 
"c:\South_Dakota\SD_Fall_River", each of which contains
one or more files with the suffix "sta", containing
detailed descriptions of all the Legacy STORET stations
in the county, and one or more files with the suffix "res",
containing all the results ever reported of monitoring 
activities conducted within the county.  These sets of
files are limited to 50,000 rows each, so that each may 
be loaded into a typical spread sheet environment like
Microsoft Excel (R).  

These files are all tab delimited, meaning that fields
within each are separated by tabs, for ease of data 
conversion into spread sheets.  The station files are
ordered by Legacy STORET Primary Station Code, and the 
result files are ordered by Legacy STORET Parameter Code,
then by Primary Station Code, and within that, by 
sample Start Date.

Also included on your CD are several files containing
codes and definitions used in the data files, also tab
delimited to ease conversion into spread sheets.

For your convenience, the first two rows in each file
contain column headers to aid in the interpretation
of the data.