Bay 2012

Integrated Environmental Assessment and Management — Volume 8, Number 4—pp.
597–609
ß 2012 SETAC 597
Comparison of National and Regional Sediment Quality

Guidelines for Classifying Sediment Toxicity in California
Steven M Bay,*y Kerry J Ritter,y Doris E Vidal-Dorsch,y and L Jay Fieldz
y Southern California Coastal Water Research Project, 3535 Harbor Blvd., Suite 110, Costa Mesa, California 92626, USA
z National Oceanic and Atmospheric Administration, Office of Response and Restoration, Seattle, Washington, USA
(Submitted 30 January 2009; Returned for Revision 1 April 2009; Accepted 19 June 2012)
EDITOR’S NOTE
This article represents 1 of 6 papers describing development and evaluation of a sediment quality assessment framework to
support implementation of California’s new sediment quality objectives for bays and estuaries, which became effective in 2009.
Over thirty scientists collaborated on this effort by the California State Water Resources Control Board, which resulted in the
establishment of one of the first statewide programs in the US to fully incorporate the sediment quality triad for regulatory
Special Series
applications.
ABSTRACT
A number of sediment quality guidelines (SQGs) have been developed for relating chemical concentrations in sediment to
their potential for effects on benthic macroinvertebrates, but there have been few studies evaluating the relative effectiveness of
different SQG approaches. Here we apply 6 empirical SQG approaches to assess how well they predict toxicity in California
sediments. Four of the SQG approaches were nationally derived indices that were established in previous studies: effects range
median (ERM), logistic regression model (LRM), sediment quality guideline quotient 1 (SQGQ1), and Consensus. Two
approaches were variations of nationally derived approaches that were recalibrated to California-specific data (CA LRM and CA
ERM). Each SQG approach was applied to a standardized set of matched chemistry and toxicity data for California and an index
of the aggregate magnitude of contamination (e.g., mean SQG quotient or maximum probability of toxicity) was calculated. A
set of 3 thresholds for classification of the results into 4 categories of predicted toxicity was established for each SQG approach
using a statistical optimization procedure. The performance of each SQG approach was evaluated in terms of correlation and
categorical classification accuracy. Each SQG index had a significant, but low, correlation with toxicity and was able to correctly
classify the level of toxicity for up to 40% of samples. The CA LRM had the best overall performance, but the magnitude of
differences in classification accuracy among the SQG approaches was relatively small. Recalibration of the indices using
California data improved performance of the LRM, but not the ERM. The LRM approach is more amenable to revision than other
national SQGs, which is a desirable attribute for use in programs where the ability to incorporate new information or chemicals
of concern is important. The use of a consistent threshold development approach appeared to be a more important factor than
type of SQG approach in determining SQG performance. The relatively small change in classification accuracy obtained with
regional calibration of these SQG approaches suggests that further calibration and normalization efforts are likely to have
limited success in improving classification accuracy associated with biological effects. Fundamental changes to both SQG
components and conceptual approach are needed to obtain substantial improvements in performance. These changes include
updating the guideline values to include current use pesticides, as well as developing improved approaches that account for
changes in contaminant bioavailability. Integr Environ Assess Manag 2012;8:597–609. ß 2012 SETAC
Keywords: Sediment Quality Guidelines California Toxicity Chemistry
INTRODUCTION potential for biological effects. Although SQGs have limited

Many monitoring programs are conducted to evaluate usefulness when used in isolation, they are an important
chemical contamination effects on sediment quality, but component of most sediment quality assessment frameworks
interpreting these data is often difficult (Wenning et al. 2005). that use multiple lines of evidence (Bay and Weisberg this
The biological availability of chemicals in sediments is issue; Wenning et al. 2005).
complex and not entirely understood. Moreover, the chem- A number of SQGs for relating chemical concentrations to
icals are often present in complex mixtures for which potential for effects on benthic macroinvertebrates have been
potential joint effects are difficult to predict. Sediment developed, generally falling into 2 classes. The first is a
quality guidelines (SQGs), consisting of various types of mechanistic approach, which models the chemical and
chemical concentration values or relationships, are frequently biological processes that affect contaminant bioavailability.
used as an tool in interpreting chemical data in the context of Current mechanistic SQGs are based on equilibrium parti-
tioning theory and apply to selected classes of contaminants,
* To whom correspondence may be addressed: steveb@sccwrp.org primarily divalent metals and several types of nonionic
Published online 21 June 2012 in Wiley Online Library organic compounds (USEPA 2003, 2005a, 2008). Although
(wileyonlinelibrary.com).
these models are useful for describing contaminant bioavail-
ability and evaluating the potential of specific chemicals to
DOI: 10.1002/ieam.1330
cause sediment toxicity, they are currently applicable only to
598 Integr Environ Assess Manag 8, 2012—SM Bay et al.
specific contaminant types. In addition, some of the param- and 3) if performance further improves when the SQGs are
eters needed to apply these guidelines (e.g., sediment acid recalibrated to 2 subregions within California.
volatile sulfides and simultaneously extracted metals) are
rarely collected in current routine monitoring programs. METHODS
Second, the more widely used empirical SQGs are derived The study assessed the performance of 6 empirical SQG
from statistical association of matched sediment chemistry approaches by applying them to matched chemistry and
and biological effects data. Multiple kinds of empirical SQGs toxicity data for California and calculating an index of overall
that are based on different statistical approaches have been contamination based on the mean SQG quotient or the
developed. Examples of empirical SQG approaches for the maximum probability of toxicity. Performance of the SQG
marine environment include effects range median (ERM), indices was evaluated in terms of correlation with magnitude
probable effects level (PEL), apparent effects threshold of biological response and categorical classification accuracy
(AET), SQGQ1, and LRM (Barrick et al. 1988; Fairey et al. (Figure 1). Four of the SQG approaches were derived in
2001; Field et al. 2002; Long et al. 1995; MacDonald previous national studies (ERM, LRM, SQGQ1, Consensus)
et al. 1996). Consensus guidelines, which aggregate several and 2 were variations of nationally derived SQGs that were
different SQGs having a similar narrative intent (e.g., median recalibrated to California-specific data (CA LRM and CA
effect), are an evolution of the empirical approach. Marine ERM). Thresholds relating each SQG index to toxicity
consensus SQGs have been developed for some constituents, response categories were derived using a standardized
including metals, polychlorinated biphenyls (PCBs), and statistical approach. Each SQG index was evaluated by
polycyclic aromatic hydrocarbons (PAHs) (MacDonald et al. determining 3 measures of association between the calculated
2000; Swartz 1999; Vidal and Bay 2005). effect categories and the observed toxicity responses: corre-
It is unclear which empirical SQG approach is most lation, weighted kappa, and percent agreement. SQG
effective for describing the potential for biological effects calibration and performance evaluations were conducted at
associated with chemical contamination. Numerous studies 2 scales to investigate the influence of regional variations in
have shown that each SQG approach has some degree of sediment characteristics: statewide (all California data) and
predictive ability with respect to biological effects, but most regional (separate northern and southern California data sets).
studies have generally been limited to examination of just 1 or
2 approaches and often use variable methods to measure Data
performance (Wenning et al. 2005). Long et al. (2000)
applied ERMs and PELs to several data sets and observed Paired sediment chemistry and toxicity measurements
different patterns in predictive ability. Vidal and Bay (2005) from California marine embayments were compiled from
compared 5 SQG approaches using a common data set and 151 dredging, monitoring, and research studies conducted
found large differences in predictive ability among some between 1984 and 2004. The database included stations from
approaches, however, their study did not include the LRM marine and estuarine embayments located from 41.948N (Del
approach. Vidal and Bay (2005) also observed that compar-
isons of SQG performance can be strongly influenced by the
selection of thresholds used to classify the results. Existing
studies are inadequate for comparing the performance of Matched Chemistry and Toxicity
empirical SQGs because of their limited scope, lack of Data Compilation
comparability in methods, and lack of thresholds derived
using a consistent methodology.
It is also unclear whether performance of SQGs is Data Standardization and
improved when they are calibrated to local conditions. The Categorization of Biological Effects
predictive ability of SQGs has been shown to vary when the
same guidelines are applied to data from different regions
(Fairey et al. 2001; Long et al. 1998, 2006; O’Connor et al.
1998; Vidal and Bay 2005). These variations in performance Calibration Data Validation Data
may be due to differences in the chemical mixtures between Set (2/3) Set (1/3)
sites or regions, variations in bioavailability due to geo-
chemical factors, or differences in the sensitivity of methods
used to measure biological effects. Variation in SQG per-
Regional LRM and
formance among studies creates uncertainty in determining ERM Calibration
the threshold of SQG exceedance associated with adverse
impacts on sediment quality. The use of SQGs and
interpretation thresholds that are derived or calibrated
relative to site-specific conditions has been recommended as Statewide and Regional SQG Index
a way to reduce the uncertainty of SQG interpretation (Fairey Threshold Development Evaluation
et al. 2001; Long et al. 2006; Vidal and Bay 2005).
This study applied 6 empirical SQG approaches to a large
California data set of paired sediment chemistry and toxicity Kappa
Agreement
measurements to assess: 1) which national SQG approach Correlation
best classifies the toxicity of California sediments, 2) whether
the relationship of national SQGs to sediment toxicity is
improved when the SQGs are recalibrated to California data, Figure 1. Schematic of data analyses.
National and Regional Sediment Quality Guidelines Comparison—Integr Environ Assess Manag 8, 2012 599
Norte County, CA) to 31.758N (US–Mexico international (TOC) for the purposes of calculating the SQGQ1 and
border). More information on the studies used to populate Consensus quotients. Estimated values were not used in
this database can be found at http://www.sccwrp.org/view. calculations for any other analytes missing in the data sets,
php?id¼519. except when needed to calculate standardized sums of PAHs,
The data were screened to select information that was of PCBs, or pesticides. For example, a value for phenanthrene
high quality and comparable. All stations were from locations was estimated for a sample that contained data for other
in enclosed bays or harbors at subtidal depths and only data PAHs to use the standardized method to calculate the PAH
from surficial sediment (top 30 cm or less) were selected. sums, but the estimated phenanthrene value was not used
Toxicity data were limited to information from solid phase individually to calculate summary SQG values for that
10-day amphipod survival tests using Rhepoxynius abronius or sample.
Eohaustorius estuarius and conducted using standardized The standardized data set was divided into 2 groups to
methods (USEPA 1994). Overall, 74% of the data were from facilitate investigation of regional differences in chemical
tests using E. estuarius. The proportion of tests per species contamination on SQG performance: northern California
varied regionally, with E. estuarius tests comprising 90% and embayments north of Point Conception and southern
60% of the data in the northern California and southern California embayments south of Point Conception. Each
California data sets, respectively. Toxicity data were further regional data set was further divided into 2 portions: a
screened to ensure mean negative control survival was 90% calibration subset used for index development and threshold
and overlying water ammonia concentrations (initial and final, calibration, and an independent validation subset used for the
if available) were less than species-specific criteria (USEPA analysis of SQG performance. Approximately one-third of
1994). Sediment grain size was not used as a toxicity data the data were used for validation. The validation samples
screening criterion. Screening steps to select chemistry data were selected by first grouping the data into 1 of 8 subregions
for analysis included a review of the data quality assessment based on latitude to ensure even spatial representation. The
from the study authors, use of comparable extraction/ samples within each subregion were then ranked by the mean
digestion methods, and measurement of a minimum suite of ERM quotient (mERMq) and one-third of the samples
contaminants that included multiple metals and PAHs. systematically sampled from throughout the mERMq distri-
Standardized sums of PAHs, dichlorodiphenyltrichloro- bution. Additional validation data were obtained from recent
ethane (DDTs), PCBs, and chlordanes were calculated using monitoring studies that were not included in the initial data
a consistent methodology for all samples. Low molecular compilation effort. The north and south validation data sets
weight PAHs (LMW PAH) were calculated as the sum of contained 146 and 249 samples, respectively.
acenaphthene, anthracene, biphenyl, naphthalene, 2,6-dime-
thylnaphthalene, fluorene, 1-methylnaphthalene, 2-methyl-
naphthalene, 1-methylphenanthrene, and phenanthrene.
High molecular weight PAHs (HMW PAH) was the sum National SQGs
of benzo[a]anthracene, benzo[a]pyrene, benzo[e]pyrene, The ERM guideline values are based on the analysis of
chrysene, dibenz[a,h]anthracene, fluoranthene, perylene, marine chemistry and biological effects data from throughout
and pyrene. Total PAHs was the sum of LMW PAH and North America (Long et al. 1995). These SQGs use results
HMW PAH values. Total PCBs was calculated from the sum from a wide range of biological effects measures, including
of congeners 8, 18, 28, 44, 52, 66, 101, 105, 110, 118, 128, acute and sublethal sediment toxicity tests of field sediments,
138, 153, 180, 187, and 195. The congener list was a subset of spiked sediment experiments, benthic community assess-
that used by the NOAA Status and Trends Program; the sum ments, fish pathology, and mechanistic models of sediment
was multiplied by a correction factor of 1.72 to approximate toxicity. In general, the chemical concentrations associated
the value obtained using the larger NOAA list. Total DDTs with adverse effects for each study were compiled and sorted
represented the sum of p,p0 -DDT, o,p0 -DDT, p,p0 -DDE, in ascending order, with the ERM representing the median
o,p0 -DDE, p,p0 -DDD, and o,p0 -DDD. Total chlordane was concentration of the data distribution. The index used to
the sum of a-chlordane (cis-chlordane), oxychlordane, trans- represent the ERM approach in the present study was the
chlordane, trans-nonachlor, and g-chlordane. mean ERM quotient (mERMQ) developed by Long et al.
Data were estimated for values reported as below reporting (2000), which was calculated by dividing each chemical
limits based on multiple regression imputation, taking concentration by its respective ERM and averaging the
advantage of covariation among the many chemical and individual quotients. A subset of 28 ERM values was used
sediment variables. Imputation produces lesser bias than to calculate the mERMQ (Table 1), which was the same as
conventional approaches for interpreting nondetect data, such that used in previous mERMQ performance studies (Long
as substituting zero or 50% of the reporting limit (Helsel et al. 2000).
2005). SAS PROC MI (SAS Institute, Cary, NC) was used to The SQGQ1 approach is a composite of chemical guide-
impute values in a sequential stepwise fashion by contaminant lines from other approaches that were selected to provide
type. Metal data were estimated first, followed in order by an improved ability predict toxicity to amphipods using
pesticides, PAHs, and PCBs. The stepwise manner in which California data (Fairey et al. 2001). These values are a
the groups of data variables were imputed was used because combination of consensus values for PAHs and PCBs (Swartz
SAS PROC MI could not compute all imputations in a 1999; MacDonald et al. 2000), ERMs, and PELs (probable
single step. The stepwise procedure also allowed for better effects level) (MacDonald et al. 1996). The index used to
control of the data variables used in the imputations for represent the SQGQ1 guidelines in the present study was the
each chemical group. Estimated values were constrained mean SQGQ1 quotient, which was calculated by dividing
to always be less than the study reporting limit. The impu- each chemical concentration by its respective SQG (Table 1)
tation method was also used to estimate total organic carbon and averaging the individual quotients.
Table 1. Chemical values for individual sediment quality guidelines used for data analyses
Chemical Units ERM CA ERM SoCA ERM NorCA ERM SQGQ1 Consensus
As mg/kg 70.0 19.2 19.1 55.0
Cd mg/kg 9.6 1.0 1.2 0.6 4.2 5.9
Cr mg/kg 370.0 154.0 110 291.0 224.9
Cu mg/kg 270.0 151.0 208 91.2 270 225.0
Pb mg/kg 218.0 87.4 94.5 56.4 112.2 222.3
Hg mg/kg 0.71 0.8 0.8 0.7 0.6
Ni mg/kg 51.6 83.5 42 67.6
Ag mg/kg 3.7 0.9 1.1 0.4 1.8 3.4
Zn mg/kg 410.0 332.5 406.9 214.5 410.0 357.1
2-Methylnaphthalene mg/kg 670.0 22.2 23.6 20.2
Acenaphthene mg/kg 500.0 23.0 24.5 19.0
Acenaphthylene mg/kg 640.0 26.0 47 19.8
Anthracene mg/kg 1100.0 130.0 215.5 60.8
Benzo[a]anthracene mg/kg 1600.0 356.6 540 169.5
Benzo[a]pyrene mg/kg 1600.0 405.5 630 225.3
Chrysene mg/kg 2800.0 577.0 739.9 239.0
Dibenz[a,h]anthracene mg/kg 260.0 94.4 130 23.4
Dieldrin mg/kg 8.0 2.0 2 0.8 8.0 7.0
Fluoranthene mg/kg 5100.0 432.3 723 410.9
Fluorene mg/kg 540.0 30.7 46.2 NA
Naphthalene mg/kg 2100.0 34.4 33.4 42.5
p,p0 -DDE mg/kg 25.9 38.3 3.8
Phenanthrene mg/kg 1500.0 267.5 275.9 310.6
Pyrene mg/kg 2600.0 534.8 1,000 480.0
Chlordane, total mg/kg 17.2 23.1 4.0 6.0
DDTs, total mg/kg 46.1 49.3 60 13.1 25.4

a a
PAH, total mg/kg 1800.0 1800.0
PCB, total mg/kg 180.0 111.5 125.4 21.3 400.0 0.47
Tributyltin mg/kg 202.0 308 30.0
CA LRM ¼ California logistic regression model; SoCA LRM ¼ southern California LRM; NorCA LRM ¼ northern California LRM; p,p0 -DDE ¼ 1-chloro-4-[2,2-dichloro-
1-(4-chlorophenyl)ethenyl]benzene; DDTs ¼ sum of p,p0 DDT, o,p0 DDT, p,p0 DDE, o,p0 DDE, p,p0 DDD, o,p0 DDD; PAH ¼ polycyclic aromatic hydrocarbons;
PCB ¼ polychlorinated biphenyls.
Values for the effects range median (ERM) were taken from Long et al. (1995); CA ERM, SoCA ERM, and NorCA ERM indicate California-specific ERM values for the
entire state, southern California, and northern California, respectively. Mean sediment quality guideline quotient 1 (SQGQ1) values taken from Fairey et al. (2001).
Consensus midpoint effect concentration values taken from Swartz (1999), MacDonald et al. (2000), and Vidal and Bay (2005). Concentrations are on a dry
weight basis except where noted.
a
Organic carbon basis (mg/g).
Consensus SQGs are chemical values based on the mean of at least 3 different SQGs having a similar intended
integration of multiple SQG approaches in an effort to application (e.g., to predict probable biological effects). The
obtain guidelines with greater validity. The integration Consensus SQG values for PAHs and PCBs were midrange
method and types of SQGs used vary, but in general the effect concentrations obtained from Swartz (1999) and
consensus SQG represents either the arithmetic or geometric MacDonald et al. (2000), respectively. Consensus values for
DDTs, dieldrin, As, Cd, Cr, Cu, Pb, Hg, Ni, Ag, and Zn were and calculating medians based on the distribution of all data,
obtained from Vidal and Bay (2005). The index used to rather than selected values from each study.
represent the Consensus SQGs in the present study was the California LRMs for individual chemicals were developed
mean Consensus quotient, which was calculated by dividing for the statewide and regional California data sets using the
each chemical concentration by its respective Consensus methods described in USEPA (2005b). These models
SQG (Table 1) and averaging the individual quotients. were applied to the California calibration data using
The LRM approach uses a suite of regression model to <80% control adjusted amphipod survival as the definition
relate chemical concentration to the probability of sediment of a toxic sample. The specific models included in the CA
toxicity. Chemical-specific models were developed using LRM, SoCA LRM, and NorCA LRM approaches were
logistic regression analysis of a large database of marine selected from a library of candidate models that included
amphipod survival data from field studies throughout North national models, as well as models derived using the
America (Field et al. 1999, 2002). The logistic regression California data sets. The selected models were chosen based
model is described by the following equation: on the suitability of fit with the observed probability of
toxicity (Table 2). Models with high false positive rates were
p ¼ exp½b0 þ b1 ðxÞ=ð1 þ exp½b0 þ b1 ðxÞ; not included.
where: p ¼ probability of observing a toxic effect, b0 ¼

intercept parameter, b1 ¼ slope parameter, and x ¼ log Threshold development
concentration of the chemical. Evaluating the indices with respect to categorical classi-
Separate logistic regression models for 18 contaminants fication accuracy requires identification of category thresholds
were used in this study; the models were selected from a for each SQG index. Such thresholds are generally unavail-
larger collection of models developed previously, based on a able for these SQG approaches or vary in the method of
combination of occurrence of the chemical in the California development. The thresholds used in this study were
data set and a low rate of false positives for predicting toxicity developed for each SQG approach using a consistent
when applied to the calibration data set. The model methodology so that differences in performance would reflect
parameters used for each chemical were obtained from Field inherent differences among approaches, rather than variations
et al. (2002). The maximum probability of observing a toxic in how thresholds were assigned.
effect (Pmax) across all 18 LRM models was used as the index Three thresholds, defining 4 ranges of SQG index results,
of overall contamination for the National LRM approach. As were established for each SQG approach. Each SQG index
a point of comparison analogous to the chemical concen- range corresponded to 1 of 4 categories of toxicological
trations used for the ERM, SQGQ1, and Consensus response that were based on classification systems used in
approaches, we calculated the concentration corresponding other studies (Long et al. 2006). The toxicity categories were
to a 50% probability of toxicity (T50) for each chemical specific to each test species and were based on analyses of the
model (Table 2). minimum significant difference and magnitude of response
(percent of negative control survival) to California samples
Regional SQGs (Greenstein et al. this issue). The categories for E. estuarius
Regionally calibrated versions were developed for 2 of the were: Nontoxic (90% survival), Low Toxicity (82%–89%),
national SQG approaches: ERM and LRM. Regional versions Moderate Toxicity (59%–81%) and High Toxicity (<59%).
were not developed for the other national SQG approaches The categories for R. abronius were: Nontoxic (>90%
(SQGQ1 and Consensus) because these approaches are based survival), Low Toxicity (83%–89%), Moderate Toxicity
on the inclusion of SQG values from other sources and cannot (70%–82%) and High Toxicity (<70%).
be easily recalibrated with new data. Three versions of each SQG-specific thresholds, T1, T2, and T3, were selected
regional SQG approach were developed: a statewide version through a bootstrap optimization procedure that maximized
that was calibrated to data from throughout California (CA weighted agreement between SQG classifications and the 4
ERM or CA LRM), and 2 region-specific versions. The region- levels of toxicity. The weights were selected according to the
specific versions were calibrated separately for the northern linear weighting scheme of Cicchetti and Allison (1971) and
California (NorCA ERM or NorCA LRM) and southern were intended to give partial credit to those SQG classi-
California (SoCA ERM or SoCA LRM) data sets. fications that were close to the observed toxicity level. For
For the CA ERM variations, local calibration involved example, suppose SQG scores ranged from 1 to 4, corre-
calculation of new individual chemical ERM values using sponding to the 4 increasing levels of toxicity (e.g., from
methods adapted from Long et al. (1995). The data were Nontoxic to High Toxicity). If a particular sample yielded an
screened to select toxic samples (<80% control adjusted SQG score ¼ 1 and the reported toxicity test result was
amphipod survival) with chemical concentrations >2 classified as Nontoxic, the classifications agreed and were
median concentration of nontoxic samples. A separate assigned the highest weight. A slightly lower yet relatively
screening process was used for each chemical. After screening, high weight (i.e., more partial credit) was given when the
the effects data were sorted in ascending order and the disagreement was relatively small (e.g., SQG score ¼ 2 and
median concentration of each chemical was selected as the test result ¼ Nontoxic). Less weight to a SQG classification
region-specific ERM value. ERM values were calculated for all was given when disagreement was relatively large (e.g., SQG
chemicals having >10 records in the screened data set. This score ¼ 3 and Toxicity ¼ Nontoxic). Weights for the weighted
resulted in calculating CA ERM and SoCA ERM values for 28 agreement statistic and a description of its calculation are
chemicals, and NorCA ERM values for 26 chemicals given in the Supplemental Data.
(Table 1). This method deviated from the method of Long The optimization of thresholds with respect to weighted
et al. (2006) in using only data from amphipod toxicity tests agreement was repeated (i.e., bootstrapped) on 50 subsam-
Table 2. Logistic regression parameters for the regional and national models compared in this study
LRM CA LRM SoCA LRM NorCA LRM

Chemical Units B0 B1 T50 B0 B1 T50 B0 B1 T50 B0 B1 T50
Cadmium mg/kg 0.3 2.5 1.4 0.3 3.2 0.8 0.3 3.2 0.8 1.5 3.4 0.4
Copper mg/kg 5.6 2.6 145.0 6.8 2.8 268.0 6.6 3.8 51.0
Lead mg/kg 5.5 2.8 94.0 4.7 2.8 46.0 8.6 4.8 62.0
Mercury mg/kg 0.1 2.7 1.1 1.7 3.1 0.3
Nickel mg/kg 8.5 5.7 30.0
Zinc mg/kg 8.0 3.3 245.0 5.1 2.4 132.0 10.0 4.2 234.0 13.8 6.9 100.0
1-Methylnaphthalene mg/kg 4.1 2.1 94.0
1-Methylphenanthrene mg/kg 3.6 1.8 112.0
2,6-Dimethylnaphthalene mg/kg 4.1 1.9 133.0
2-Methylnaphthalene mg/kg 3.8 1.8 128.0
Acenaphthene mg/kg 3.6 1.8 116.0
Acenaphthylene mg/kg 3.0 1.4 140.0
Benzo[a]pyrene mg/kg 2.3 1.2 80.0
Benzo[b]fluoranthene mg/kg 4.5 1.5 1107.0 4.6 2.3 90.0
Biphenyl mg/kg 4.1 2.2 73.0
alpha-Chlordane mg/kg 3.4 4.5 5.8 3.4 4.5 5.8
gamma-Chlordane mg/kg 3.6 4.2 7.4
Chrysene mg/kg 2.5 1.3 95.0
Dieldrin mg/kg 1.2 2.6 2.9 1.8 2.6 5.1 1.2 4.3 2.0
Fluoranthene mg/kg 4.5 1.5 1034.0
Fluorene mg/kg 3.7 1.8 114.0
HMW PAH mg/kg 8.2 2.0 12506.0 8.2 2.0 12506.0 4.3 1.5 785.2
LMW PAH mg/kg 6.8 1.9 4127.0 6.8 1.9 4127.0 3.4 1.5 185.2
Naphthalene mg/kg 3.8 1.6 217.0
trans-Nonachlor mg/kg 4.3 5.3 6.3 4.3 5.3 6.3
o,p’-DDD mg/kg 2.0 3.3 4.1 1.1 2.0 0.3
p,p’-DDD mg/kg 1.9 1.5 19.0 1.8 2.0 7.6 0.8 2.5 2.0
p,p’-DDT mg/kg 3.6 3.3 12.0 1.5 1.6 8.1 0.6 3.3 1.5
Phenanthrene mg/kg 4.5 1.7 455.0
DDTs, total mg/kg 1.3 2.8 3.0
PCB, total mg/kg 3.5 1.4 368.0 4.4 1.5 945.0 4.4 1.5 945.0 4.4 1.5 945.0
CA LRM ¼ California logistic regression model; SoCA LRM ¼ southern California LRM; NorCA LRM ¼ northern California LRM; HMW PAH ¼ high molecular weight
polycyclic aromatic hydrocarbons; LMW PAH ¼ low molecular weight polycyclic aromatic hydrocarbons; o,p0 -DDD ¼ 1-chloro-2-[2,2-dichloro-1-(4-chlorophe-
nyl)ethyl]benzene; p,p0 -DDD ¼ 1-chloro-4-[2-chloro-1-(4-chlorophenyl)ethenyl]benzene; p,p0 -DDT ¼ 1-chloro-4-[2,2,2-trichloro-1-(4-chlorophenyl)ethyl]ben-
zene; DDTs ¼ sum of p,p0 -DDT, o,p0 -DDT, p,p0 -DDE, o,p0 -DDE, p,p0 -DDD, and o,p0 -DDD; PCB ¼ polychlorinated biphenyl. Values for the national logistic
regression model (LRM) were taken from Field et al. (2002); CA LRM, SoCA LRM, and NorCA LRM indicate California-specific LRM values for the entire state,
southern California, and northern California, respectively. B0 ¼ intercept; B1 ¼ slope; T50 is the calculated concentration corresponding to a toxicity probability of
0.5. Concentrations are on a dry weight basis.
plings of the calibration data. With each subsampling, 40 categories, respectively (see Supplemental Data). SAS PROC
representatives from each of the 4 toxicity categories were FREQ (SAS Institute) was used to calculate the weighted
selected randomly without replacement from the larger data kappa (Stokes et al. 2000) Weighted kappa values range
set. This step was necessary due to the greater prevalence of between 1 and 1, where 1 is 100% agreement. Weighted
nontoxic samples in the calibration data set. Using the full kappa values >0 indicate that the SQG is performing better
data set, potential exists for the lowest threshold to be set than is expected by chance alone; weighted kappa ¼ 0
artificially high simply to increase the number of samples implies no improvement over chance classification; weighted
classified into the lowest category, consistent with the kappa values <0 indicate a less than chance expectation of
majority. However, these thresholds would not be useful classification accuracy.
for detecting low to moderate levels of toxicity in those water A bootstrap resampling approach similar to that used for
bodies that did not exhibit the same preponderance of threshold development was also used in calculation of the
nontoxic samples as that found in our calibration data set. correlation, percent agreement, and weighted kappa values.
By selecting our subsamples uniformly across the 4 categories, The reported correlation and classification accuracy values are
we could select (and evaluate) thresholds relative to their the median of 50 resamples. The approach having the highest
ability to discriminate among the 4 toxicity categories equally, median values for both correlation and classification accuracy
without preference for 1 category over the other. In addition, was selected as the best performing SQG. Those medians that
subsampling 50 times, ensured that nearly every sample was fell below the 10th percentile of the distribution having the
included in the analysis at least once. highest median performance (i.e., correlation, percent agree-
A set of 3 optimal 3 thresholds was determined for each ment, and weighted kappa) were deemed statistically different.
bootstrap sample by comparing weighted agreement statistics Medians above the 10th percentile were characterized as
for a large set of possible candidate thresholds and then statistically similar. Correlation results were given greater
choosing the set of 3 thresholds that yielded the largest weight when the rankings were variable among the perform-
weighted agreement. Candidates consisted of all ordered ance measures to minimize the influence of threshold selection.
permutations of 3 threshold values, taken at 5% increments of Bootstrapping addressed 3 important issues. First, boot-
the SQG’s range. Weighted values for thresholds between the strapping was used to create data subsets with a uniform
5% increments were linearly interpolated. To ensure con- distribution of toxicity and thus eliminate prevalence bias due
vergence of optimization and so that optimal thresholds were to the relatively high proportion of nontoxic samples in the
not too close to one another, distances between individual validation data set. SQG accuracy then was assessed with
thresholds within each set were constrained to be no less than respect all 4 categories equally, without preference to a single
10% of the chemical range. Taking the median of each category. Without correction for prevalence, less sensitive
optimal threshold across all 50 subsamples gave the final set SQGs (those that tend to classify samples in the lowest
of SQG-specific threshold values for the Low (T1), Moderate toxicity category) or SQGs with a stronger correlation with
(T2), and High (T3) categories. nontoxic or low toxicity samples will tend to perform better
than other SQGs, simply because there are more nontoxic
samples to evaluate. In addition, the correction for chance in
Evaluation of SQG performance the weighted kappa statistic may impose an unfair penalty for
SQG performance was evaluated by quantifying the greater agreement in the lower categories due to the skewness
strength of association between sediment chemistry and of toxicity distribution. For a more thorough examination of
toxicity in terms of both correlation and categorical classi- the effect of prevalence on performance statistics, see Mouton
fication accuracy. Correlation was measured as the non- et al. (2010), Feinstein and Cicchetti (1990), and Lantz and
parametric Spearman’s correlation coefficient between the Nebenzahl (1996). Second, bootstrapping provided a more
SQG index value (i.e., mean quotient or Pmax) and percent robust performance evaluation because SQGs were evaluated
amphipod mortality (100-control adjusted survival). Analyses across multiple subsamplings, where the relative contribution
of categorical classification accuracy were based on the of contaminants varied within each of the toxicity categories.
frequency with which the SQG index category (determined Taking the median as a measure of performance, removed the
by applying the thresholds derived from the calibration data influence of spurious results or outliers. Finally, bootstrapping
set) correctly predicted the measured toxicity response allowed for statistical comparisons to be made among the
category. All analyses were conducted using an independent SQGs.
validation data set that was not used for threshold develop-
ment. Two measures of classification accuracy were calcu- RESULTS
lated: percent agreement and weighted kappa. Percent Different patterns of sediment contamination were appa-
agreement is the number of samples that are correctly rent between the northern and southern California data sets
classified, calculated as A ¼ (Nc/Nt) 100 where A ¼ percent (Table 3), reflecting different anthropogenic inputs and
percent agreement, Nc ¼ number of samples correctly clas- geochemistry. Median concentrations of most PAH com-
sified, and Nt ¼ total number of samples. pounds, Cr, and Ni were greater in the north, whereas the
The weighted kappa statistic (Cohen 1960, 1968) is also a south data set contained higher concentrations of chlordane,
measure of agreement between the SQG predictions and Cu, DDTs, PCBs, and Zn. The southern California data set
toxicity, but differs in that a correction for chance is applied usually contained the highest concentrations of each con-
and partial credit is given according to the magnitude of taminant, which may reflect the larger south data set. An
disagreement. Kappa weights were based on the linear exception was the presence of higher Cr and Ni concen-
weighting scheme of Cicchetti and Allison (1971); a weight trations in the north data set, which was likely due to higher
of 1 was assigned to cases of perfect agreement and weights of naturally occurring concentrations of these elements in
1/3, 1/6, and 0 assigned to disagreements of 1, 2, or 3 toxicity northern California soils.
Table 3. Cumulative distribution of sediment chemistry data for the California samples used in the analyses
Northern California Southern California

Chemical Units N 50th Percentile 90th Percentile N 50th Percentile 90th Percentile
2-Methylnaphthalene mg/kg 367 10.6 27.2 713 9.6 49.1
Acenaphthene mg/kg 407 6.0 21.2 674 5.1 46.0
Acenaphthylene mg/kg 398 8.2 24.3 671 6.2 79.0
Anthracene mg/kg 422 20.2 91.1 771 18.0 370.0
As mg/kg 393 8.5 12.9 828 8.6 17.3
Benz[a]anthracene mg/kg 427 63.8 189.0 838 44.9 720.0
Benzo[a]pyrene mg/kg 430 95.7 289.0 845 65.9 1100.0
Cd mg/kg 420 0.2 0.4 850 0.4 1.4
Chlordanes, total mg/kg 404 0.8 3.3 816 7.1 34.3
Cr mg/kg 329 122.0 245.0 851 56.0 95.0
Chrysene mg/kg 427 72.0 229.0 847 64.0 1090.0
Cu mg/kg 405 40.1 65.5 851 76.5 252.0
DDTs, total mg/kg 404 3.6 12.4 816 21.4 112.0
Dibenz[a,h]anthracene mg/kg 412 12.1 32.5 787 19.1 230.0
Dieldrin mg/kg 368 0.2 0.9 297 1.0 3.4
Fluoranthene mg/kg 425 151.0 423.0 849 89.9 1320.0
Fluorene mg/kg 414 9.3 34.4 708 6.9 77.5
Pb mg/kg 409 21.2 37.8 851 35.9 101.0
Hg mg/kg 430 0.3 0.4 843 0.2 0.9
Naphthalene mg/kg 365 20.9 51.2 733 9.4 44.3
Ni mg/kg 399 84.0 114.6 838 20.7 36.6
PAHs, total mg/kg 431 945.0 2492.0 851 619.0 8573.0
PCB, total mg/kg 351 7.9 32.0 851 24.8 196.2
Phenanthrene mg/kg 392 75.4 242.0 815 39.8 429.0
Pyrene mg/kg 427 190.0 520.0 850 102.0 1500.0
Ag mg/kg 418 0.2 0.5 839 0.4 1.4
Tributyltin mg/kg 122 3.7 70.0 306 108.0 619.0
Zn mg/kg 409 110.0 164.0 851 180.0 369.0
DDTs ¼ sum of p,p0 -DDT, o,p0 -DDT, p,p0 -DDE, o,p0 -DDE, p,p0 -DDD, and o,p0 -DDD; PAH ¼ polycyclic aromatic hydrocarbons; PCB ¼ polychlorinated biphenyl.
There was a similar range and distribution of sediment There were large differences in the number of chemicals
toxicity in the northern and southern California data sets and their threshold concentrations included in the different
(Figure 2). The distribution of the data was skewed toward SQG indices (Tables 1 and 2). The number of chemicals
low toxicity; approximately 60% of the samples in each varied from 9 for the SQGQ1 to 28 for the CA ERM.
region had >80% survival and <10% had <40% survival. The Individual chemical concentrations for the ERM, SQGQ1,
similarity in distribution between regions despite differences and Consensus SQGs were similar because these values were
in the relative proportion of data from the 2 amphipod often derived from similar sources. There were often large
species suggests that both species were responding similarly to differences in individual chemical concentrations between the
the sediment characteristics. national and region-specific versions of the ERM. This was
statewide data set, predicting the correct category of toxicity

40 37% of the time. Very little improvement in classification
accuracy was obtained using the CA ERM approach, relative
to the national ERM approach. Although both measures of
Percent of Samples
30 classification accuracy ranked the SQG approaches similarly,

Northern California
Southern California the weighted kappa statistic provided a greater degree of
discrimination among approaches than did percent agree-
20 ment.
The percent agreement results for all of the SQG indices
investigated in this study were low, relative to the theoretical
10 maximum value of 100%. Each index showed a greater
percent agreement over the 4 categories of toxicity than that
expected by chance, but the net improvement was small (e.g.,
0
90-100 80-90 70-80 60-70 50-60 40-50 30-40 20-30 10-20 0-10 37%–31% versus 25% by chance). When the SQG indices and
statewide thresholds were evaluated relative to the regional
Percent Survival
data sets, the CA LRM was the only approach with
consistently high classification accuracy and correlations for
Figure 2. Distribution of sediment toxicity data (10-day amphipod survival) both northern and southern California, relative to the
for the California samples used in the analysis. performances of the other SQGs (Table 6). The CA ERM
also had relatively high classification accuracy for northern
especially evident for PAH compounds, where the national California data, whereas the ERM and Consensus indices had
ERM values were 1 to 2 orders of magnitude greater than the relatively high classification accuracy for southern California.
CA ERMs (Table 1). The positive values for weighted kappa indicate a moderate
The categorization thresholds for the SQGs varied improvement over chance classification.
geographically (e.g., statewide, north, south). The highest Developing thresholds on a regional basis had little effect
thresholds were usually obtained for southern California data, on performance of the national SQG approaches. Percent
but the differences were typically small (Table 4). The agreement scores across indices were almost identical
SQGQ1 was an exception, having nearly a 3-fold difference between thresholds developed using statewide and regional
between thresholds derived using northern and southern data sets (Table 6). However, classification accuracy
California data. The thresholds relate to the toxicity (weighted kappa) was improved for the worst performing
categories as follows: Nontoxic ¼ <low threshold; Low SQG approaches, such as SQGQ1 in the south and national
Toxicity ¼ low threshold to <moderate threshold; Moderate LRM in the north.
Toxicity ¼ moderate threshold to <high threshold; High The development of region-calibrated SQGs also resulted
Toxicity ¼ high threshold. in relatively little improvement in performance, compared to
Each of the statewide-calibrated SQG approaches corre- statewide-calibrated versions. The greatest improvement in
lated significantly with amphipod survival when applied to performance was obtained with the northern California
statewide validation data, though correlations were generally calibrated version of the LRM (NorCA LRM), which had
low in magnitude. Spearman correlation coefficients ranged 33% classification agreement versus 27% for the CA LRM
from 0.35 to 0.16 (Table 5), with the CA LRM having the (Table 6). No improvement in performance was obtained
highest correlation. Correlations generally increased when the with the other region-calibrated SQGs (SoCA LRM, NorCA
indices were evaluated using the separate north and south ERM and SoCA ERM).
data sets, though CA LRM performed best in both habitats
(Table 6). DISCUSSION
The CA LRM (Table 5) also performed best with respect Although the Pmax calculated using the CA LRM was the
to classification accuracy when the indices were applied to the best-performing SQG index, there was relatively little differ-
Table 4. Thresholds used for evaluations of SGQ index classification accuracy
Low Threshold Moderate Threshold High Threshold
SQG Approach Index North South State North South State North South State
National ERM Mean Quotient 0.08 0.06 0.07 0.15 0.12 0.13 0.29 0.38 0.33
National LRM Maximum Probability 0.17 0.23 0.230 0.26 0.44 0.35 0.50 0.61 0.55
Consensus Mean Quotient 0.15 0.14 0.14 0.23 0.26 0.25 0.51 0.60 0.55
SQGQ1 Mean Quotient 0.06 0.16 0.160 0.11 0.34 0.19 0.33 0.80 0.52
CA LRM Maximum Probability 0.25 0.42 0.34 0.42 0.58 0.250 0.62 0.72 0.67
CA ERM Mean Quotient 0.15 0.14 0.15 0.23 0.25 0.24 0.68 1.28 0.93
ERM ¼ effects range median; LRM ¼ logistic regression model; SQGQ1 ¼ sediment quality guideline quotient 1; CA LRM ¼ California LRM; CA ERM ¼ California
ERM.
Table 5. Nonparametric Spearman correlation (r) and classification consistent methodology and calibration data set. The stand-
accuracy (weighted kappa) of statewide SQG approaches with ardized thresholds allowed each SQG approach to be
amphipod mortality evaluated on a similar basis, so that differences in perform-
ance could be compared without the confounding effect of
Weighted %
differences in threshold selection.
Region Approach Kappa Agreement r
Two of the SQG approaches were recalibrated using
State CA LRM 0.23 37 0.35 California data, which had mixed effects. For the CA LRM,
there was a substantive improvement in performance, but
State National ERM 0.17 32 0.25 performance of the mean quotients based on the CA ERM,
State Consensus 0.17 31 0.25 was comparable to that of the national mERMQ. This may
have resulted from differences in the SQG calibration
State National LRM 0.15 35 0.22 process. The CA ERMs consisted entirely of new values that
were derived from the California data set. All available CA
State CA ERM 0.17 33 0.20
ERMs were used in the quotient calculations, regardless of
State SQGQ1 0.12 32 0.16 their reliability for predicting toxicity. In contrast, predictive
ability (relative to the national LRM models) was taken into
ERM ¼ effects range median; LRM ¼ logistic regression model; SQGQ1 ¼ sedi- account when selecting the set of models used for the CA
sediment quality guideline quotient 1; CA LRM ¼ California LRM; CA
LRM. A similar selection process was not used for the CA
ERM ¼ California ERM. Values are the median of the bootstrapped analyses.
Shaded cells indicate values that are statistically similar (within the 90th ERM because of differences in derivation methodology
percentile) to the highest value. Analyses were conducted on the combined compared to the national ERMs, which were based on
data for the north and south validation data sets and used thresholds multiple types of toxicity tests and other biological response
developed using the statewide data set. values (Long et al. 1995).
The improved performance of the CA LRM may also have
ence in performance among many of the indices. This differs been due to differences in the composition, magnitude, and
from the findings of Vidal and Bay (2005) and probably bioavailability of sediment contamination in the California
results from using thresholds that were selected using a data, relative to the data used for national LRM development.
Table 6. Classification accuracy (weighted kappa) and Spearman correlation (r) of SQG approaches applied to data from each region
separately
Northern California Southern California

Approach Weighted kappa % Agreement r Weighted kappa % Agreement r
Statewide thresholds
CA LRM 0.20 38 0.39 0.25 35 0.42
National ERM 0.12 27 0.31 0.21 38 0.28
Consensus 0.12 28 0.23 0.22 36 0.31
National LRM 0.11 35 0.18 0.18 34 0.33
CA ERM 0.21 33 0.22 0.15 34 0.18
SQGQ1 0.13 35 0.25 0.10 28 0.26
Region-specific thresholds
CA LRM 0.16 27 0.39 0.28 40 0.42
National ERM 0.17 30 0.31 0.22 38 0.28
Consensus 0.15 29 0.23 0.25 39 0.31
National LRM 0.20 33 0.15 0.22 36 0.33
CA ERM 0.21 33 0.22 0.13 33 0.18
SQGQ1 0.21 33 0.25 0.18 33 0.26
Nor/SoCA LRM 0.21 33 0.27 0.22 36 0.37
Nor/SoCA ERM 0.20 35 0.22 0.18 35 0.18
ERM ¼ effects range median; LRM ¼ logistic regression model; SQGQ1 ¼ sediment quality guideline quotient 1; CA LRM ¼ California LRM; CA ERM ¼ California
ERM; Nor/SoCA LRM ¼ northern or southern California LRM; Nor/SoCA ERM ¼ northern or southern California ERM. Values are the median of the bootstrapped
analyses. Shaded cells indicate values that are statistically similar (within the 90th percentile) to the highest value. Analyses were conducted separately using
thresholds developed with statewide and region-specific data sets.
Regional differences in contamination and geochemistry have SQGs are dependent on the availability of values from other
been identified as important factors affecting the predictive sources. Local calibration is also not feasible for these
accuracy of SQGs (Long et al. 2000; Wenning et al. 2005). approaches for the same reason.
Because the values used in empirical SQG approaches are The best performing index, CA LRM, is highly amenable
derived from chemistry-toxicity relationships in the calibra- to revision as demonstrated by this study. However, LRM
tion data set, regionally calibrated approaches would be approaches are also the most difficult to apply and interpret
expected to have greater predictive accuracy. because a complex set of regressions must be used to
The regional SQG results suggest that further improve- determine probabilities of toxicity, rather than comparing
ment in SQG performance could be obtained through further chemistry data to a simple table of SQG values. These
site-specific normalization or the use of mechanistic SQGs. difficulties can be overcome by incorporating the regression
However, normalization of the organics data to TOC and calculations into spreadsheets or other data analysis tools and
metals data to a reference element (Fe) and use of US establishing thresholds for interpreting the Pmax values.
Environmental Protection Agency (USEPA) equilibrium The low levels of correlation and agreement attained in this
partitioning sediment benchmarks were evaluated in prelimi- study represent the maximum likely to be attained when
nary phases of this study and did not result in any improve- empirical SQG approaches are applied to sediments with the
ment in correlation or classification accuracy. low to moderate levels of contamination characteristic of
Use of thresholds calibrated to the north and south California bays and estuaries. A higher level of performance
subregions produced only small increases in performance might be obtained in regions having higher sediment
relative to the statewide thresholds. The relatively small contamination levels (Long et al. 2006). The SQG values
differences in regional performance are probably related to and thresholds developed in this study should not be applied
the heterogeneous nature of sediment contamination. to other regions without validation, as they have been
Although there are differences in overall pattern and optimized to match California contamination patterns. It is
magnitude of contamination in the northern and southern recommended that similar calibration efforts, especially
California data sets, contamination patterns within each threshold optimization, be conducted before applying SQGs
region are highly diverse due to the presence of multiple in other regions to maximize SQG index performance.
water bodies and diverse contaminant sources. The high uncertainty associated with the indices under-
Regional thresholds for the SQGQ1 differed more than for scores their limited usefulness to represent sediment quality
the other SQG approaches, with higher thresholds for the when used without supporting lines of evidence (e.g., toxicity
south data set. The cause for this difference was not and biological assessment). These indices are also ineffective
determined, but it may have been related to the relatively for identifying the cause of sediment toxicity, as they are not
small set of contaminants used in calculating the SQGQ1 based on chemical-specific concentration–response relation-
index. Data for only 9 contaminants were used in the ships. These limitations of SQGs are well known and are
SQGQ1, whereas the other approaches used 13 to 28 addressed in most sediment quality assessment frameworks by
contaminants. Use of fewer components in the SQGQ1 using these approaches in combination with biological effects
may have made this index more sensitive to variations in measures in a multiple lines of evidence approach (Wenning
regional contamination patterns. However, the apparent et al. 2005), such as that recently adopted by the state of
greater sensitivity of the SQGQ1 did not result in a higher California (SWRCB 2008).
level of performance relative to the other SQGs. Substantial improvement in performance beyond that
The limited improvement in classification accuracy described here will require fundamental changes to both
obtained with regional calibration suggests that further SQG components and conceptual approach. For example, all
calibration and normalization efforts are likely to have limited of the SQG approaches in common use are based on an
success in improving the association of empirical SQG indices outdated list of priority and legacy pollutants (e.g., PCBs,
with biological effects. Interlaboratory variation in the trace metals, PAHs, chlorinated pesticides) that does not
chemistry or toxicity analyses may have reduced the SQG include current use pesticides. These pesticides, including
performance values as data were compiled from multiple pyrethroids (e.g., bifenthrin) and organophosphates (e.g.,
laboratories and over many years. A formal intercalibration chlorpyrifos), have widespread occurrence in coastal water-
was not possible for this study, but the effects of interlabor- sheds, bays, and estuaries (Delgado-Moreno et al. 2011; Lao
atory variation are expected to relatively small as most of the et al. 2012). Pyrethroids in particular have been identified as
data were compiled from regional monitoring programs that a dominant cause of sediment toxicity in California
employ robust QA/QC procedures for both chemistry and streams and estuaries (Bay et al. 2011; Holmes et al. 2008).
toxicity data. SQG index performance values were also likely Although the current list of SQG contaminants is effective
reduced by the analysis of bootstrapped data subsets having a for characterizing potential exposure to unmeasured chem-
uniform range of toxicity, which reduced the proportion of icals having similar sources, this assumption may not be valid
nontoxic samples that most indices have greater success in for current use pesticides and other compounds having
predicting. different sources and input history. Similarly, recent research
Because the performance difference among SQG indices indicates that measurement of PAHs may not adequately
was small, characteristics such as history of use, ease of represent the potential for toxicity from oils (Mount et al.
application, types of chemicals included in the constituent 2009).
array, and feasibility for revision should be considered when Development of SQG approaches that account for changes
selecting the SQG approach to be used. For instance, the in contaminant bioavailability are also needed to improve the
Consensus and SQGQ1 approaches incorporate a lesser interpretation of sediment contamination. The bulk chem-
number of chemicals than the other approaches and it is istry measurements used in current empirical approaches do
difficult to add new contaminants of concern because these not address bioavailability and thus are unable to accurately
depict changes in organism exposure resulting from geo- Field LJ, MacDonald D, Norton SB, Severn CG, Ingersoll CG. 1999. Evaluating
chemical factors. Progress has been made in developing sediment chemistry and toxicity data using logistic regression modeling.
mechanistic SQG approaches based on equilibrium partition- Environ Toxicol Chem 18:1311–1322.
ing theory (USEPA 2005a, 2008), but additional research is Greenstein DJ, Bay SM. 2012. Selection of methods for assessing sediment
needed to develop approaches that perform well for sediment toxicity in California bays and estuaries. Integr Environ Assess Manag 8:
quality assessment under a wide range of conditions. In recent 625–637.
years, new methods for evaluating the bioavailability of Helsel D. 2005. More than obvious: Better methods for interpreting nondetect
sediment contaminants based on passive sampling devices or data. Environ Sci Technol 39:419A–423A.
measures of rapidly desorbing contaminant pools have been Holmes RW, Anderson BS, Phillips BM, Hunt JW, Crane DB, Mekebri A, Connor
V. 2008. Statewide investigation of the role of pyrethroid pesticides in
developed that show promise for application in sediment
sediment toxicity in California’s urban waterways. Environ Sci Technol 42:
quality assessment (Maruya et al. this issue). Incorporation of
7003–7009.
this technology into sediment quality assessment frameworks,
Lantz CA, Nebenzahl E. 1996. Behavior and interpretation of the kappa statistic:
either as a replacement for existing SQG approaches or as an
Resolution of the two paradoxes. J Clin Epidemiol 49:431–434.
additional line of evidence, holds promise for strengthening
Lao W, Tiefenthaler L, Greenstein DJ, Maruya KA, Bay SM, Ritter K, Schiff K. 2012.
the available tools for interpreting the significance of sediment Pyrethroids in Southern California coastal sediments. Environ Toxicol Chem
contamination. 31:1649–1656.
Long ER, Field JE, MacDonald DD. 1998. Predicting toxicity in marine sediments
SUPPLEMENTAL DATA with numerical sediment quality guidelines. Environ Toxicol Chem 17:714–
Calculation of weighted agreement and weighted kappa 727.
statistic. Long ER, Ingersoll CG, MacDonald DD. 2006. Calculation and uses of mean
sediment quality guideline quotients: A critical review. Environ Sci Technol
Acknowledgment—The authors thank Chris Beegan from 40:1726–1736.
the California Water Resources Control Board, and Mike Long ER, MacDonald DD, Severn CG, Hong CB. 2000. Classifying the probabilities of
Connor and Bruce Thompson of the San Francisco Estuary acute toxicity in marine sediments with empirically derived sediment quality
Institute for their suggestions on the design of this study. Peggy guidelines. Environ Toxicol Chem 19:2598–2601.
Myre of Exa Data and Mapping compiled and standardized the Long ER, MacDonald DD, Smith SL, Calder FD. 1995. Incidence of adverse biological
effects within ranges of chemical concentrations in marine and estuarine
data sets. Jeff Brown, Diana Young, and Darrin Greenstein
sediments. Environ Manage 19:81–97.
assisted with data compilation and statistical analysis. The
MacDonald DD, Carr RS, Calder FD, Long ER, Ingersoll CG. 1996. Development and
authors also thank Peter Landrum, Ed Long, Todd Bridges,
evaluation of sediment quality guidelines for Florida coastal waters.
Tom Gries, Rob Burgess and Bob Van Dolah for their thought-
Ecotoxicology 5:253–278.
ful review of the ideas contained within the document.
MacDonald DD, Di Pinto LM, Field LJ, Ingersoll CG, Long ER, Swartz RC. 2000.
Work on this project was funded by the California State Development and evaluation of consensus-based sediment effect
Water Resources Control Board under agreement 01-274- concentrations for polychlorinated biphenyls (PCB). Environ Toxicol Chem
250-0. 19:1403–1413.
Maruya KA, Landrum PF, Burgess RM, Shine JP. 2012. Incorporating contaminant
REFERENCES bioavailability into sediment quality assessment frameworks. Integr Environ
Barrick R, Becker S, Brown L, Beller H, Pastorok R. 1988. Sediment Quality Values Assess Manag 8:659–673.
refinement: 1988 update and evaluation of Puget Sound AET, Volume 1. Mount DR, Heinis LJ, Highland TL, Hockett JR, Hoff DJ, Jenson CT, Norberg-King TJ.
Bellevue, WA: PTI Environmental Services. 177 p. 2009. Are PAHs the right metric for assessing toxicity related to oils, tars,
Bay SM, Greenstein DJ, Maruya KA, Lao W. 2011. Toxicity identification evaluation creosote, and similar contaminants in sediments? In 5th International
of sediment (sediment TIE) in Ballona Creek Estuary. Technical Report 634. Costa Conference on Remediation of Contaminated Sediments, February 2–5,
Mesa, CA: Southern California Coastal Water Research Project. 2009, Jacksonville, FL.
Bay SM, Weisberg SB. A framework for interpreting sediment quality triad data. Mouton AM, DeBaets B, Goethals PLM. 2010. Ecological relevance of
Integr Environ Assess Manag 8:589–596. performance criteria for species distribution models. Ecol Modell 221:1995–
Cicchetti DV, Allison T. 1971. A new procedure for assessing reliability of scoring 2002.
EEG sleep recordings. Am J EEG Technol 11:101–109. O’Connor TP, Daskalakis KD, Hyl JL, Paul JF, Summers JK. 1998. Comparisons of
Cohen J. 1960. A coefficient of agreement for nominal scales. Educ Psychol Meas sediment toxicity with predictions based on chemical guidelines. Environ
20:37–46. Toxicol Chem 17:468–471.
Cohen J. 1968. Weighted Kappa nominal scale agreement with provision for scale [SWRCB] State Water Resources Control Board. 2008. Water quality control plan for
disagreement or partial credit. Psychol Bull 70:213–220. enclosed bays and estuaries. Part I: Sediment quality. Sacramento, CA: State
Delgado-Moreno L, Lin K, Velga-Nascimento R, Gan J. 2011. Occurrence and Water Resources Control Board.
toxicity of three classes of insecticides in water and sediment in two southern Stokes ME, Davis CS, Koch GC. 2000. Categorical data analysis using the SAS
California coastal watersheds. J Agric Food Chem 59:9448–9456. system. 2nd ed. Cary (NC): SAS Institute.
Fairey R, Long ER, Roberts CA, Anderson BS, Phillips BM, Hunt JW, Puckett Swartz RC. 1999. Consensus sediment quality guidelines for PAH mixtures. Environ
HR, Wilson CJ. 2001. An evaluation of methods for calculating mean Toxicol Chem 18:780–787.
sediment quality guideline quotients as indicators of contamination and [USEPA] United States Environmental Protection Agency. 1994. Methods
acute toxicity to amphipods by chemical mixtures. Environ Toxicol Chem for assessing the toxicity of sediment-associated contaminants with
20:2276–2286. estuarine and marine amphipods. Washington DC: USEPA. EPA 600-R94-
Feinstein AR, Cicchetti DV. 1990. High agreement but low kappa: I. The problems of 025.
two paradoxes. J Clin Epidemiol 43:6:543–549. [USEPA] United States Environmental Protection Agency. 2003. Procedures for the
Field LJ, MacDonald DD, Norton SB, Ingersoll CG, Severn CG, Smorong D, Lindskoog derivation of equilibrium partitioning sediment benchmarks (ESBs) for the
R. 2002. Predicting amphipod toxicity from sediments using Logistic Regression protection of benthic organisms: PAH mixtures. Washington DC: USEPA.
Models. Environ Toxicol Chem 9:1993–2005. EPA-600-R-02-013.
[USEPA] United States Environmental Protection Agency. 2005a. Procedures for the protection of benthic organisms: Compendium of Tier 2 values for nonionic
derivation of equilibrium partitioning sediment benchmarks (ESBs) for the organics. Washington DC: USEPA. EPA-600-R-02-016.
protection of benthic organisms: Metal mixtures (cadmium, Cu, Pb, Ni, Ag, Vidal DE, Bay SM. 2005. Comparative sediment guideline performance for
and Zn). Washington DC: USEPA. EPA-600-R-02-011. predicting sediment toxicity in southern California, USA. Environ Toxicol
[USEPA] United States Environmental Protection Agency. 2005b. Predicting toxicity Chem 24:3173–3182.
to amphipods from sediment chemistry (Final Report). Washington DC: USEPA. Wenning RJ, Batley GE, Ingersoll CG, Moore DW. (editors). 2005. Use of
EPA/600/R-04/030. sediment quality guidelines (SQGs) and related tools for the assessment of
[USEPA] United States Environmental Protection Agency. 2008. Procedures for the contaminated sediments. Pensacola (FL): Society of Environmental Toxicology
derivation of equilibrium partitioning sediment benchmarks (ESBs) for the and Chemistry.

Bay 2012

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Bay 2012

Caricato da

Copyright:

Formati disponibili

Integrated Environmental Assessment and Management — Volume 8, Number 4—pp.

Comparison of National and Regional Sediment Quality

Keywords: Sediment Quality Guidelines California Toxicity Chemistry

INTRODUCTION potential for biological effects. Although SQGs have limited

As mg/kg 70.0 19.2 19.1 55.0

Cd mg/kg 9.6 1.0 1.2 0.6 4.2 5.9

Cr mg/kg 370.0 154.0 110 291.0 224.9

Cu mg/kg 270.0 151.0 208 91.2 270 225.0

Pb mg/kg 218.0 87.4 94.5 56.4 112.2 222.3

Hg mg/kg 0.71 0.8 0.8 0.7 0.6

Ni mg/kg 51.6 83.5 42 67.6

Ag mg/kg 3.7 0.9 1.1 0.4 1.8 3.4

Zn mg/kg 410.0 332.5 406.9 214.5 410.0 357.1

2-Methylnaphthalene mg/kg 670.0 22.2 23.6 20.2

Acenaphthene mg/kg 500.0 23.0 24.5 19.0

Acenaphthylene mg/kg 640.0 26.0 47 19.8

Anthracene mg/kg 1100.0 130.0 215.5 60.8

Benzo[a]anthracene mg/kg 1600.0 356.6 540 169.5

Benzo[a]pyrene mg/kg 1600.0 405.5 630 225.3

Chrysene mg/kg 2800.0 577.0 739.9 239.0

Dibenz[a,h]anthracene mg/kg 260.0 94.4 130 23.4

Dieldrin mg/kg 8.0 2.0 2 0.8 8.0 7.0

Fluoranthene mg/kg 5100.0 432.3 723 410.9

Fluorene mg/kg 540.0 30.7 46.2 NA

Naphthalene mg/kg 2100.0 34.4 33.4 42.5

p,p0 -DDE mg/kg 25.9 38.3 3.8

Phenanthrene mg/kg 1500.0 267.5 275.9 310.6

Pyrene mg/kg 2600.0 534.8 1,000 480.0

Chlordane, total mg/kg 17.2 23.1 4.0 6.0

DDTs, total mg/kg 46.1 49.3 60 13.1 25.4

PCB, total mg/kg 180.0 111.5 125.4 21.3 400.0 0.47

Tributyltin mg/kg 202.0 308 30.0

where: p ¼ probability of observing a toxic effect, b0 ¼

LRM CA LRM SoCA LRM NorCA LRM

Mercury mg/kg 0.1 2.7 1.1 1.7 3.1 0.3

Nickel mg/kg 8.5 5.7 30.0

1-Methylnaphthalene mg/kg 4.1 2.1 94.0

1-Methylphenanthrene mg/kg 3.6 1.8 112.0

2,6-Dimethylnaphthalene mg/kg 4.1 1.9 133.0

2-Methylnaphthalene mg/kg 3.8 1.8 128.0

Acenaphthene mg/kg 3.6 1.8 116.0

Acenaphthylene mg/kg 3.0 1.4 140.0

Benzo[a]pyrene mg/kg 2.3 1.2 80.0

Benzo[b]ﬂuoranthene mg/kg 4.5 1.5 1107.0 4.6 2.3 90.0

Biphenyl mg/kg 4.1 2.2 73.0

alpha-Chlordane mg/kg 3.4 4.5 5.8 3.4 4.5 5.8

gamma-Chlordane mg/kg 3.6 4.2 7.4

Chrysene mg/kg 2.5 1.3 95.0

Fluoranthene mg/kg 4.5 1.5 1034.0

Fluorene mg/kg 3.7 1.8 114.0

Naphthalene mg/kg 3.8 1.6 217.0

trans-Nonachlor mg/kg 4.3 5.3 6.3 4.3 5.3 6.3

o,p’-DDD mg/kg 2.0 3.3 4.1 1.1 2.0 0.3

Phenanthrene mg/kg 4.5 1.7 455.0

DDTs, total mg/kg 1.3 2.8 3.0

Northern California Southern California

2-Methylnaphthalene mg/kg 367 10.6 27.2 713 9.6 49.1

Acenaphthene mg/kg 407 6.0 21.2 674 5.1 46.0

Acenaphthylene mg/kg 398 8.2 24.3 671 6.2 79.0

Anthracene mg/kg 422 20.2 91.1 771 18.0 370.0

As mg/kg 393 8.5 12.9 828 8.6 17.3

Benz[a]anthracene mg/kg 427 63.8 189.0 838 44.9 720.0

Benzo[a]pyrene mg/kg 430 95.7 289.0 845 65.9 1100.0

Cd mg/kg 420 0.2 0.4 850 0.4 1.4