Dataset Identification:
Resource Abstract:
- description: This data release contains the input-data files and R scripts associated with the analysis presented in [citation
of manuscript]. The spatial extent of the data is the contiguous U.S. The input-data files include one comma separated value
(csv) file of county-level data, and one csv file of city-level data. The county-level csv ( county_data.csv ) contains data
for 3,109 counties. This data includes two measures of water use, descriptive information about each county, three grouping
variables (climate region, urban class, and economic dependency), and contains 18 explanatory variables: proportion of population
growth from 2000-2010, fraction of withdrawals from surface water, average daily water yield, mean annual maximum temperature
from 1970-2010, 2005-2010 maximum temperature departure from the 40-year maximum, mean annual precipitation from 1970-2010,
2005-2010 mean precipitation departure from the 40-year mean, Gini income disparity index, percent of county population with
at least some college education, Cook Partisan Voting Index, housing density, median household income, average number of people
per household, median age of structures, percent of renters, percent of single family homes, percent apartments, and a numeric
version of urban class. The city-level csv (city_data.csv) contains data for 83 cities. This data includes descriptive information
for each city, water-use measures, one grouping variable (climate region), and 6 explanatory variables: type of water bill
(increasing block rate, decreasing block rate, or uniform), average price of water bill, number of requirement-oriented water
conservation policies, number of rebate-oriented water conservation policies, aridity index, and regional price parity. The
R scripts construct fixed-effects and Bayesian Hierarchical regression models. The primary difference between these models
relates to how they handle possible clustering in the observations that define unique water-use settings. Fixed-effects models
address possible clustering in one of two ways. In a "fully pooled" fixed-effects model, any clustering by group
is ignored, and a single, fixed estimate of the coefficient for each covariate is developed using all of the observations.
Conversely, in an unpooled fixed-effects model, separate coefficient estimates are developed only using the observations in
each group. A hierarchical model provides a compromise between these two extremes. Hierarchical models extend single-level
regression to data with a nested structure, whereby the model parameters vary at different levels in the model, including
a lower level that describes the actual data and an upper level that influences the values taken by parameters in the lower
level. The county-level models were compared using the Watanabe-Akaikeinformation criterion (WAIC) which is derived from the
log pointwise predictive density of the models and can be shown to approximate out-of-sample predictive performance. All script
files are intended to be used with R statistical software (R Core Team (2017). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org) and Stan probabilistic
modeling software (Stan Development Team. 2017.RStan: the R interface to Stan. R package version 2.16.2. http://mc-stan.org).;
abstract: This data release contains the input-data files and R scripts associated with the analysis presented in [citation
of manuscript]. The spatial extent of the data is the contiguous U.S. The input-data files include one comma separated value
(csv) file of county-level data, and one csv file of city-level data. The county-level csv ( county_data.csv ) contains data
for 3,109 counties. This data includes two measures of water use, descriptive information about each county, three grouping
variables (climate region, urban class, and economic dependency), and contains 18 explanatory variables: proportion of population
growth from 2000-2010, fraction of withdrawals from surface water, average daily water yield, mean annual maximum temperature
from 1970-2010, 2005-2010 maximum temperature departure from the 40-year maximum, mean annual precipitation from 1970-2010,
2005-2010 mean precipitation departure from the 40-year mean, Gini income disparity index, percent of county population with
at least some college education, Cook Partisan Voting Index, housing density, median household income, average number of people
per household, median age of structures, percent of renters, percent of single family homes, percent apartments, and a numeric
version of urban class. The city-level csv (city_data.csv) contains data for 83 cities. This data includes descriptive information
for each city, water-use measures, one grouping variable (climate region), and 6 explanatory variables: type of water bill
(increasing block rate, decreasing block rate, or uniform), average price of water bill, number of requirement-oriented water
conservation policies, number of rebate-oriented water conservation policies, aridity index, and regional price parity. The
R scripts construct fixed-effects and Bayesian Hierarchical regression models. The primary difference between these models
relates to how they handle possible clustering in the observations that define unique water-use settings. Fixed-effects models
address possible clustering in one of two ways. In a "fully pooled" fixed-effects model, any clustering by group
is ignored, and a single, fixed estimate of the coefficient for each covariate is developed using all of the observations.
Conversely, in an unpooled fixed-effects model, separate coefficient estimates are developed only using the observations in
each group. A hierarchical model provides a compromise between these two extremes. Hierarchical models extend single-level
regression to data with a nested structure, whereby the model parameters vary at different levels in the model, including
a lower level that describes the actual data and an upper level that influences the values taken by parameters in the lower
level. The county-level models were compared using the Watanabe-Akaikeinformation criterion (WAIC) which is derived from the
log pointwise predictive density of the models and can be shown to approximate out-of-sample predictive performance. All script
files are intended to be used with R statistical software (R Core Team (2017). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org) and Stan probabilistic
modeling software (Stan Development Team. 2017.RStan: the R interface to Stan. R package version 2.16.2. http://mc-stan.org).
Citation
- Title 2010 County and City-Level Water-Use Data and Associated Explanatory Variables.
-
- creation Date
2018-05-20T00:02:59.575956
Resource language:
Processing environment:
Back to top:
Digital Transfer Options
-
- Linkage for online resource
-
- name Dublin Core references URL
- URL: https://doi.org/10.5066/F72Z14FR
- protocol WWW:LINK-1.0-http--link
- link function information
- Description URL provided in Dublin Core references element.
Metadata data stamp:
2018-08-06T23:15:09Z
Resource Maintenance Information
- maintenance or update frequency:
- notes: This metadata record was generated by an xslt transformation from a dc metadata record; Transform by Stephen M. Richard, based
on a transform by Damian Ulbricht. Run on 2018-08-06T23:15:09Z
Metadata contact
-
pointOfContact
- organisation Name
CINERGI Metadata catalog
-
- Contact information
-
-
- Address
-
- electronic Mail Address cinergi@sdsc.edu
Metadata language
eng
Metadata character set encoding:
utf8
Metadata standard for this record:
ISO 19139 Geographic Information - Metadata - Implementation Specification
standard version:
2007
Metadata record identifier:
urn:dciso:metadataabout:79744d55-6eb4-4f7d-bd74-1fb1fa69b663
Metadata record format is ISO19139 XML (MD_Metadata)