Dataset Identification:
Resource Abstract:
- <p>The USDA Agricultural Research Service (ARS) recently established <a href="https://www.ars.usda.gov/office-of-international-research-programs/scinet/"
title="ARS high performance computing and high speed research network">SCINet</a> , which consists of a shared high performance
computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet
users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their
active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural
Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and
professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling.</p><p>The
ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data
storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies.
The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While
the report was for internal use, the survey and resulting data may be generally useful and are being released publicly.</p><p>From
October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS
Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed
the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based
on their unit's practices or their management preferences, whether to delegate response to a data management expert in their
unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.</p><p>Larger
storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the
true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47
respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are
called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses
to estimate likely responses for those who did not respond.</p><p>We defined active data as data that would be used within
the next six months. All other data would be considered inactive, or archival.</p><p>To calculate per person storage needs
we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in
a group response. For Big Data users we used the actual reported values or estimated likely values.</p>
Citation
- Title Current and projected research data storage needs of Agricultural Research Service researchers in 2016
-
- revision Date
2017-03-16
Resource language:
[u'en']
Constraints on resource usage:
-
- Constraints
-
- Use limitation statement:
- public
point of contact
-
publisher
- individual Name {u'hasEmail': u'mailto:cynthia.parr@ars.usda.gov', u'fn': u'Parr, Cynthia'}
- organisation Name
{u'@type': u'org:Organization', u'name': u'USDA NAL'}
-
- Contact information
-
-
- Address
-
- electronic Mail Address
Back to top:
Metadata data stamp:
2017-03-16
Metadata contact
-
publisher
Metadata scope code
dataset
Metadata standard for this record:
ISO 19115:2003 - Geographic information - Metadata
standard version:
ISO 19115:2003
Metadata record identifier:
88482aa1-9369-4b36-a6dd-a6f6b3b24306
Metadata record format is ISO19139 XML (MD_Metadata)