top of page

Repository

5 Ways to Support Diversity in Data Science

Type:

Literature

This week, I'm joining 18,000 colleagues and potential collaborators at the Grace Hopper Celebration of Women in Computing, the world’s largest gathering of women in technology. In addition to technical talks and workshops like "IoT for Social Good" and "Mission Critical Computing Systems for Space Flight," taking center stage are sessions addressing a critical lack of women in STEM fields, including “Why Has Tech Failed at Building Diverse Workforces?” and “Strategically Developing and Retaining Women in Leadership.”

social impact

Tags:
Author(s):
Meredith Lee

A web-based spatial framework for quantifying stormwater reduction opportunities and water resource benefits

Type:

Model

The inherent spatial and temporal variability of urban hydrology creates challenges for measuring long-term benefits of managing stormwater as a resource and prioritizing actions. A holistic quantification and accounting of effective stormwater volume reductions through infiltration, disconnection, storage etc. can be used to robustly infer downstream multi-benefits such as groundwater banking, base flow restoration, climate change mitigation, stream habitat enhancement etc. Existing stormwater management planning and tracking tools generally underutilize readily available public data sets that can help identify spatial patterns and detect the signal of long term change above the noise of climate variations. 2NFORM has created a technically robust yet computationally simple stormwater model to estimate, and track urban stormwater and pollutant load reductions as a result of BMP implementation directly demonstrating compliance with a number of MS4 NPDES Permit requirements. The stormwater Tool to Estimate Load Reductions (TELR) makes use of publicly available data to assist local and regional managers to make better stormwater management decisions. With TELR, freely available hydrography, rainfall, soils, land-use, impervious cover, impaired waterbodies, and local parcel-assessors’ layers are used to characterize urban catchments using a standardized methodology ; and widely-tested USDA algorithms are employed to estimate infiltration and runoff. TELR outputs provide simple, urban catchment-based spatial accounting of the benefits of structural and non-structural BMPs for distributed rainfall infiltration. In addition to directly demonstrating regulatory compliance, TELR is a framework for organizing urban water resource data that directly aligns with the State SWRP guidance to move toward a more integrated water management approach. TELR and supporting data collection tools are spatially-based, allowing municipalities to evaluate and communicate how stormwater improvements may directly lead to downstream water resource and ecological improvements. TELR provides a science-based approach to stormwater planning and tracking with outputs that are amenable to analysis at the regional and state levels for transparent assessment of priorities and progress over time.

water data challenge

Tags:
Author(s):

AI Ethics Is Not a Panacea

Type:

Literature

From machine learning (ML) and computer vision to robotics and natural language processing, the application of data science and artificial intelligence (AI) is expected to transform health care (Celi et al. 2019). While the rapid development of technological capabilities offers paths toward new discoveries and large-scale analysis, numerous critical ethical issues have been identified, spanning privacy, data protection, transparency and explainability, responsibility, and bias.

education; community engagement

Tags:
Author(s):
Meredith Lee

Advancing Data Science Research: An Integrated Analysis

Type:

Literature

he National Science Foundation (NSF)’s Harnessing the Data Revolution (HDR) Big Idea is a visionary, national-scale activity to enable new modes of data-driven discovery, allowing fundamentally new questions to be asked and answered in science and engineering frontiers, generating new knowledge and understanding, and accelerating discovery and innovation. As part of this initiative, NSF seeks to identify what is needed to advance a robust ecosystem of data science research. Currently, the regional Big Data Hubs work to identify areas of collaboration and opportunities for supporting data science research, gathering input from the broader data science community. Knology, a nonprofit research organization, was selected by the Northeast Hub to handle data collection and analysis across a survey, online discussion, and conference with current HDR PIs and other NSF-identified stakeholders, and then synthesize findings in a public report.

AI; social impact; machine learning

Tags:
Author(s):
Rebecca Joy Norlander

Aggregating Municipal and State Open Data for Water Quality Investigations

Type:

Model

Water quality monitoring required by municipal separate storm sewer system (MS4) permits in California generally focuses on determining the status and trends of conditions in receiving waters as they relate to designated beneficial uses. Results provide a pulse for the watershed. Just like a human pulse, while this information highlights if an issue exists, using it to find the source of the stressor is much more complex. This study presents a method for aggregating state and municipal data sources to assist with source identification. Specifically, it uses open data from Orange County Public Works (OCPW) and the California Storm Water Multiple Action and Report Tracking System (SMARTS) to identify possible dischargers responsible for exceedances of user-defined thresholds within Orange County’s watersheds. While applied to historical data because of the lack of real-time data sources, the application of this method with real-time sources enables quick reaction to degradations in water quality.

water data challenge

Tags:
Author(s):

All-Hub Data and CyberInfrastructure Working Group Playlist

Type:

Playlist

The All-Hub Data and CyberInfrastructure Working Group was relaunched as an All-Hub working group in collaboration with all four Big Data Hubs. the Data Sharing and CyberInfrastructure Working Group works to: (1) map existing data cyberinfrastructure collaborations, services, and resources; (2) identify Hub member and Spoke project data and cyberinfrastructure requirements and opportunities for new collaborations and matchmaking of need to existing technologies; (3) build collaborations and partnerships to facilitate federated data sharing, computing, and analysis across institutions and partners, leveraging the activities of the Open Storage Network, XCEDE, the National Data Service (NDS), and others; (4) develop a testbed, and demonstrate integration of iRODs, NDS Labs, XSEDE, and Discovery Environment; and (5) secure funding to meet these objectives.

cyberinfrastructure; data management; open data

Tags:
Author(s):

An Automated Water Resources Tracking System: Near Real-Time Decision Support for Water and Wetland Managers

Type:

Model

Competition for water is likely to intensify as California is projected to experience continued increases in demand due to population growth, more arid growing conditions, and reduced or modified water supply due to climate change. As water resources become increasingly limited, water use needs to be optimized across many competing demands while also promoting multiple-benefits. Though sophisticated water optimization models can be useful for tracking water volume allocations, how the results translate into habitat availability for wildlife and ecosystem services for people is not known. A framework is needed to better understand the spatial distribution of water in near real-time for managers to adapt to changing conditions on the landscape and to maximize the value of the water used. We are integrating remote sensing of satellite data, classification modeling, bioinformatics, optimization, and ecological analyses to develop an automated near real-time water resources tracking and decision-support system for the Central Valley of California. The system provides information on open surface water every 16-days and delineates between wetland types and flooded agriculture. Data are made freely available online for download 3-6 days post acquisition as well as through online summary and map visualization applications. Data are also summarized specifically for wetland wildlife habitats within federal and state management areas. These data will be used by water and wetland managers to enhance landscape scale coordination of limited water supplies for wildlife, particularly during drought. In our complete vision for this system, water managers will be able to get near-real time and forecasted recommendations for where to put water on the landscape to achieve multiple wildlife habitat targets but to also provide ecosystem services (e.g. groundwater recharge). Our innovative system has applications for water management in the Central Valley to support people, places, and wildlife and is already being used for understanding the factors that drive variation in the distribution and abundance of water resources at multiple spatial and temporal scales. Specifically data generated as part of this system are being applied to assess the impact of the most recent drought in California, to understand the effect of disease vector control on water distribution, to quantify the groundwater recharge potential of current surface water management for wildlife, to develop an avian influenza risk map, and to identify where to put water and when on agricultural lands to benefit migratory birds.

water data challenge

Tags:
Author(s):

Automated Water Resource Sustainability Management

Type:

Model

Without sufficient clean water, the California (and US) economy will suffer. Growing populations and an inability or unwillingness to encourage the public to alter dietary habits linked to water supplies will increase the demand for new technologies. In a white paper entitled “Enhancing the Vision for Managing California’s Environmental Information” (http://deltacouncil.ca.gov/docs/enhancing-vision-managing-california-s-environmentalinformation), experts in environmental management acknowledge that more efficient tools are needed to reduce the life-cycle costs for managing water resources. Sensors have been cited as a key solution to rising assessment costs. However, data collection alone will not suffice, as the data will need to be appropriately collected, remotely transmitted, stored, processed, visualized and managed in a manner that allows for rapid automated responses to troubling issues. To address the challenges described above, our team developed our Water Sustainability Platform (US Patent 8,892,221, Issued in Oz and NZ). More specifically, we merged classical hydraulic theory, Game Theory, automated sensing, and processing and response to instantly determine when unsustainable conditions arise. The platform automatically reports when groundwater extraction rates would result in basin overdraft, stream depletion or seawater intrusion. This system directly answers resource management questions that are not readily determined via any other method, as it calculates and reports the maximum sustainable extraction rate for every well in a hydrogeologic network and compares this to the measured extraction rates within the system. The platform is designed to monitor and automatically respond to ever-changing groundwater levels, stream flow rates, groundwater extractions, and hydrogeologic conditions and is ideal for permitting and planning purposes (e.g., determining whether a proposed well installation could potentially impair resources and habitat). In essence, integration of our Water Sustainability Platform can significantly reduce life-cycle costs for managing water resources while ensuring that key supply and environmental objectives are being met.

water data challenge

Tags:
Author(s):

Bay Area Urban Water System Barcode

Type:

Model

The Bay Area Urban Water System Barcode is a visualization of the numerous agency that own, operate, and manage systems that provide Bay Area cities with water and wastewater services. In 2016 the Association of Bay Area of Governments required a resource to understand which stakeholders did what in the nine-county region. The Barcode as it is now allows a user to see which agencies are responsible for various phases of water and wastewater service for each individual cities. Some cities in the region have identical patterns of service, but there are many unique patterns of water and wastewater service in the Bay Area with municipalities, special districts, and private utilities responsible for varying roles depending on the city. There are many actions that can be taken to address the current and future droughts. To be a drought resilient region ABAG believes all of our collective resources must be tapped. Understanding who does what will help make it clear who would do what across each of the 101 cities in the Bay Area, as well as highlight who might need to be brought into a collaborative to achieve an outcome. We hope this is only version one of the Barcode. We would like to turn the Barcode into a Sankey diagram as shown on slide five of the Powerpoint. This will require substantial data mining of UWMPs or better reporting in DWR’s UWMP “Data Exports” tables.

water data challenge

Tags:
Author(s):

Better Data Quality for Better Water Equity: Transforming EAR data

Type:

Model

Every year, the State of California gathers data about water usage and production from all of its public water systems via Electronics Annual Reports (EARs). For instance, consider Dataset #8 in the Recommended Challenge Datasets: this dataset lists information taken from the EAR for each public water system (PWS) for each month of the year, including water production/delivery, water quality, and rates. In principle, this dataset is a treasure-trove of knowledge for government employees seeking to study, optimize, and understand water distribution in California. Despite the effort expended to gather this data, it rarely gets used in analyses. Even though the dataset contains valuable insight about water use, its clunky format makes even simple analysis questions (e.g. how does water production change in a particular PWS over time?) difficult to ask. Furthermore, the low data quality (e.g. missing fields) provides other obstacles. This repository includes Jupyter Python notebooks to address both the format and quality problems in the EAR dataset. We intend the users of this repository to be state and local government employees (or perhaps simply interested citizens) who wish to understand how, when, and where water is used in California.

water data challenge

Tags:
Author(s):

Blue Conduit

Type:

Model

Using AI to reduce uncertainty of lead service line replacement

water data challenge

Tags:
Author(s):

Boise State University Microbiome Hub

Type:

Project

The purpose of the Microbiome Hub at Boise State University is to develop resources to make microbiome pipelines, including community profiling via 16-S (QIIME2) and metagenomic shotgun sequencing (Shogun) readily accessible to researchers here and at our collaborating institutions. To achieve this, we will develop resources for researchers to upskill in using the high-performance computing environment (e.g., BORAH at Boise State University), complete microbiome pipelines, and analyze outputs. While these are being developed to primarily analyze datasets sequenced at UC San Diego (see list below), we anticipate that this pipeline can be readily transferred to ongoing projects at Boise State University (e.g., Bittleston Lab) and University of Wyoming (e.g., Beck Lab).

microbiomes; data management; data analysis

Tags:
Author(s):

Building Tribal Capacity with Water Research Partnerships Workshop

Type:

Event

The New Mexico Water Resources Research Institute (NM WRRI) invites tribal leaders and water resource personnel, water researchers, and students—both tribal and non-tribal affiliated—to participate in a free day-long workshop on Wednesday, May 19 to further understand pressing tribal water issues and help to foster future research collaborations that will help build the capacity of tribes, nations, and pueblos within New Mexico.

water; tribal engagement; community engagement; research collaboration

Tags:
Author(s):

BurnPro3D

Type:

Project

A century of suppressing wildfires has created a dangerous accumulation of flammable vegetation on landscapes, contributing to megafires that risk human life and destroy ecosystems. Prescribed burns can dramatically reduce the risk of large fires that are uncontrollable by decreasing this buildup of fuels. BurnPro3D is a science-driven, decision-support platform to help the fire management community understand risks and tradeoffs quickly and accurately when planning and conducting prescribed burns.

fire; data management; open data

Tags:
Author(s):

CA H2Open

Type:

Model

In April 2015, Governor Brown mandated a 25 percent statewide reduction in water use by urban water suppliers across the state (relative to 2013 levels) with differentiated conservation targets for utilities with varying levels of baseline per capita usage (Executive Order B-29-15). The more than 400 public water agencies affected by the regulation were also required to report monthly progress towards the conservation goal to the State Water Board. This application uses the reported data to visualize how different water utilities have responded to this mandate. In addition to displaying a summary of water use relative to the conservation target for each district, we also calculate the electricity savings associated with the reduced demand on water infrastructure services using estimates of average energy intensity per hydrologic region. We then convert the electricity savings into avoided greenhouse gas (GHG) emissions based on the emissions factor specific to water utility's regional electricity provider. We can view the total water, energy, and GHG savings aggregated at the state level, as well as for each of the individual water utilities. One of the significant findings is that the electricity savings associated with the observed achievements in water conservation is roughly equivalent to the total electricity savings estimated for all of the energy IOU efficiency programs for the period from July through September 2015 (the period where data was available for both initiatives). In addition, the water conservation-related GHG savings over the same period represent the equivalent of taking about 50,000 cars off the road for a year. In addition to the water conservation data, we prepared a few summary tables of the most common water quality testing and stormwater violation data for each utility. This data demonstrates how additional data could be integrated into the application to enrich the summary reports for each utility.

water data challenge

Tags:
Author(s):

CA Water Data Challenge Project Repository

Type:

Project; Repository

This site is a repository of projects that have been submitted to the California Water Data Challenge in previous years.

water; data competition; data analysis

Tags:
Author(s):

CASGEM Data Mining and Visualization Tools and Technologies for SMART WELLFIELDS™

Type:

Model

Problem Statements• Groundwater Science is a “black box” to most people, because we cannot “see” under the ground. • The general public, ranchers, growers, regulators, and public and private water utilities cannot be expected to understand groundwater flow dynamics and well hydraulics. • The United States has a limited ability to use groundwater science for aquifer resource management because, while we have some excellent groundwater level data, we generally do not have pumping data that are contemporaneous. Hydrogeology can be very quantitative if you know the water level and corresponding pumping rate. Solutions StatementOur firms have been developing groundwater science hardware and software tools for over 20 years. We are not software companies – we are hydrogeologists who had a vision of better science through computing. The Internet of Things (IoT) is changing people’s perspective on what is possible. We have invented a turn-key process to collect and store large, continuous data sets (AQUIMETRICS™). We have developed algorithms to use either historic data or real-time/continuous data feeds to create groundwater contour maps on-the-fly and to assess well hydraulics (AQUILYTICS™). Using our telemetry (AQUIMETRICS™), Geographic Information System (GIS), database (EPIPHINY®), and coding technologies (AQUILYTICS™), we can visualize the quantity of groundwater, where it is coming from, where it is going, and how to optimize groundwater pumping to meet an objective. Wellfield operators also know how to save water, save electricity, and save maintenance costs on their wells. Our vision is to facilitate SMART WELLFIELDS™ - a network of wells where flow rate and water level information are continually analyzed to automatically turn pumps up and down (and on and off) to better manage an aquifer resource.Entry StatementLocational data and groundwater level data for the last 10 years obtained from CASGEM were easily imported into our EPIPHINY® platform for instantaneous analysis and to export to ArcGIS®, ArcScene®, and to link with our AQUILYTICS™ platform. Data trends and groundwater contour maps are easy to prepare for “snapshot” or time series analysis. With relatively little effort, our tools could be customized to become open source on behalf of any state and provide groundwater scientists, state and county employees, wellfield operators, and the general public sophisticated groundwater software tools. We are prepared to perform a demonstration project using all of our technologies and patent to create California’s first SMART WELLFIELDS™.

water data challenge

Tags:
Author(s):

COVID Info Commons

Type:

Project

The COVID Information Commons (CIC) is an open website to facilitate knowledge-sharing and collaboration across various COVID research efforts, funded by the NSF Convergence Accelerator and the NSF Technology, Innovation and Partnerships Directorate. The CIC serves as an open resource for researchers, students, and decision-makers from academia, government, not-for-profits and industry to identify collaboration opportunities, to leverage each other's research findings, and to accelerate the most promising research to mitigate the broad societal impacts of the COVID-19 pandemic.

community engagement; education; social impact

Tags:
Author(s):

COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

Type:

Literature

There is a compelling and pressing need to better understand the temporal dynamics of public sentiment towards COVID-19 vaccines in the US on a national and state-wise level for facilitating appropriate public policy applications. Our analysis of social media data from early February and late March 2021 shows that, despite the overall strength of positive sentiment and despite the increasing numbers of Americans being fully vaccinated, negative sentiment towards COVID-19 vaccines still persists among segments of people who are hesitant towards the vaccine. In this study, we perform sentiment analytics on vaccine tweets, monitor changes in public sentiment over time, contrast vaccination sentiment scores with actual vaccination data from the US CDC and the Household Pulse Survey (HPS), explore the influence of maturity of Twitter user-accounts and generate geographic mapping of tweet sentiments. We observe that fear sentiment remained unchanged in populous states, whereas trust sentiment declined slightly in these same states. Changes in sentiments were more notable among less populous states in the central sections of the US. Furthermore, we leverage the emotion polarity based Public Sentiment Scenarios (PSS) framework, which was developed for COVID-19 sentiment analytics, to systematically posit implications for public policy processes with the aim of improving the positioning, messaging, and administration of vaccines. These insights are expected to contribute to policies that can expedite the vaccination program and move the nation closer to the cherished herd immunity goal.

open data; modeling; data management

Tags:
Author(s):
Yana Samuel

CaDC Efficiency Explorer

Type:

Model

The Governor’s May 16 Executive Order (B-37-16) calls for the development of water use targets customized to the unique conditions of each urban water agency as part of a new, permanent efficiency framework. The CaDC Efficiency Explorer is an interactive dashboard that supports the California water community in analyzing the impact of those new standards through a easy to use scenario explorer tool. The video in the attached powerpoint demonstrates the Efficiency Explorer’s functionality at an inter-agency level and at an intra-agency level.

water data challenge

Tags:
Author(s):

CaDC Gittes Water Data Dictionary and Explorer

Type:

Model

Charles Fishman, in a New York Times op-ed commented that “Water is Broken. Data can Fix it” made the case that analytics and systems borrowed from the technology industry can come to the aid to water conservation efforts. Many others including the California Council of Science and Technology, the Western Governors Association, and the Delta Stewardship Council have called for improvements in our water data systems. AB 1755 or The Open and Transparent Water Data Act is recent California legislation calls for the development of an Integrated water data platform to “help water managers operate California’s water system more effectively and help water users make informed decisions based on water availability and allocation”. In addition to catering to the water manager community, the Dodd bill also calls for the intended water data platform to “promote openness and interoperability of water data by making information accessible, discoverable, and usable by the public to foster entrepreneurship, innovation, and scientific discovery.” These are indeed lofty policy goals that require a coalition of the willing to come together, build common ground and work towards the future of water management California needs. In short there is a key need to develop tools to create and sustain shared understanding amongst various stakeholders. The GITTES Water Data Dictionary and Water Data explorer is a bottom-up effort that attempts to cater to these needs and ensure that the focus of any digital effort is on the delivery of the vision laid out in AB 1755. To that end, this submission is a first step towards creating “protocols for data sharing, documentation, quality control, public access, and promotion of open-source platforms and decision support tools related to water data.” The Water Data Dictionary is a simple cataloging effort that carefully indexes all relevant water data sources into a single schema. It was developed keep agile development methodology in mind and continuously iterating towards continuous improvement. Image: Methodology We adopt a concept called the Common Operational Picture to develop the Water Data explorer, a data visualization that is powered by the Water Data dictionary. The Common Operational Picture or COP is grounded in years of decision-science research as being a best-practice and effective tool often used in crisis and emergency management settings. The COP facilitates continuous situational awareness through the management of disparate and heterogeneous data to across the various that stakeholders, end-users and end-beneficiaries. Image: A Common operational picture visualization for CA Water Data We believe that the combination of the data practices, co-creation efforts, decision-making, and co-creation which have been put into practice here are enablers to fully realize and sustain the Integrated Water Data Platform that is called for in the Dodd Bill. This initial project uses Tableau technology so is not open source. It falls under the decision support category.

water data challenge

Tags:
Author(s):

CaDC Rate Comparison tool

Type:

Model

With declining water sales, water managers have lost over $675 million of revenue in the current drought. The CaDC rate comparison tool provides a quick way to see the implications that a rate shift or a drought surcharge would have on revenue and typical customer bills. The tool quickly illustrates the impact of a shift before a utility hires a rate consultant and goes through a full, labor intensive Prop 218 process. You can see a video demo of the tool in action here: https://youtu.be/mYv-OOBGJ28

water data challenge

Tags:
Author(s):

Cal State LA SEEDS Scholars

Type:

Project

The Social Equity Engagement geo-Data Scholars (SEEDS) Program is open to any students who participated in the past year(s) Spring Big Data courses that focused on projects using the LA GeoHub and working with local non-profit organizations. The intention of the SEEDS Program is to provide paid summer internships for Cal State LA students to work with non-profits and leverage their knowledge of the City of Los Angeles’ open-data portal. By the end of the program, we expect SEEDS to be well informed about the importance of big data and the challenges facing non-profit organizations and local citizens in big data literacy. We hope that this internship helps students promote civic engagement by gaining a strong sense of responsibility for democratizing big data and ensuring that data is shared, collaborative, and open to the broader community.

cloud computing; data management; cyberinfrastructure

Tags:
Author(s):

California Actual Evapotranspiration (CalETa) Mapping Program

Type:

Model

Formation Environmental LLC developed a statewide actual evapotranspiration (ETa) dataset to support water resource planning and management efforts in California. CalETa provides an unprecedented daily, 30-meter spatial resolution, statewide ETa dataset (currently available from 2010 through present). This dataset is the result of a comprehensive image analysis framework that relies on publically available satellite earth observation data, local meteorological data, and open source algorithms. The core of the framework utilizes the Surface Energy Balance System (SEBS) algorithm. The peer reviewed SEBS model provides a detailed parameterization for the estimation of surface heat fluxes, producing consistent ETa estimates over a wide range of land use types. The framework, consisting of multiple components, provides a robust, economical, and efficient means to estimate ETa. Validation studies performed using on-the-ground measurements of ETa show an excellent relationship between measured and modeled data. CalETa fundamentally improves water resource management at every scale, with applications of use that include: (i) drought and water conservation planning, (ii) groundwater banking, (iii) groundwater sustainability planning and management, (iv) surface and groundwater modeling, (v) on-farm water management, (vi) water transfer planning and implementation, (vii) native, riparian and invasive plant community monitoring, and (viii) irrigation performance and land use planning. There is a wide range of application for spatial ETa information, and the use of CalETa is transforming how water planning decisions are made throughout California. Examples of how this dataset is used include: 1. To enhance the capability and improve water balance computation, the Department of Water Resources (DWR) is using this dataset in the California Central Valley Groundwater-Surface Water Simulation Model (C2VSim). 2. DWR is using this dataset for monitoring, evaluating, and implementing the Sustainable Groundwater Management Act (SGMA) program. 3. DWR was tasked by the Governor’s office to respond to several drought-related questions. This dataset was fundamental to informing statewide water planning efforts and developing water conservation objectives and metrics. 4. The Nature Conservancy is evaluating the use of this dataset for long-term monitoring and impact assessment of restored and managed riparian habitat. 5. This dataset is used to identify watersheds with high tree mortality as a result of drought and bark beetle infestation. Finding corresponds with the region identified by the U.S. Forest Service. 6. The California Rice Commission used this dataset to quantify water transfers and rice fallowing in Sacramento Valley. 7. Water districts are using this dataset to implement, monitor, and quantify the effectiveness of fallowing and on-farm efficiency conservation programs. 8. CalETa is used to support private companies in development of on-farm water conservation and irrigation scheduling tools. 9. Water quality coalitions in the southern San Joaquin Valley are using this dataset as part of their Management Practices Evaluation Program, supporting efforts to protect groundwater quality by quantifying reduction in nitrate leaching. 10. The State Water Resource Control Board (SWRCB) is using this dataset to understand the water use of cannabis and prioritize its regulatory efforts.

water data challenge

Tags:
Author(s):

California Drought Dashboard

Type:

Model

During drought conditions, some of California’s native fish and other freshwater species do not get enough water to survive. In order to provide more water for fish and other species, The Nature Conservancy and other organizations can pay water users to keep the water in the streams and reservoirs rather than diverting it. Stream conditions vary across the state and over time, making it difficult to know where best to work. The California Drought Dashboard is an open source decision support tool that uses real-time open data on stream flow conditions to show the streams most stressed by drought. It also provides a Water for Fish tool that allows users to zero in on the streams that 1) have historically low flow conditions, 2) have the most fish and other freshwater species, and 3) have the best enabling conditions to purchase water for fish. With this tool, conservation organizations and water managers can determine the best locations to invest time and money to help make California’s native fish and other freshwater species more resilient to drought. The Drought Dashboard is based on the concept that native fish and other freshwater species are adapted to the historical flow conditions in a stream or river. If flow conditions drop substantially below historical average conditions, it is likely to cause stress to the native species. We are using the U.S. Geological Service stream flow data for over 200 stream gages in the state with long historical records to estimate if the current weekly average flow is above or below normal for this time of year. Using Tableau Public’s data visualization capabilities, we bring this information to the user in such a way to get an overview of the conditions across the state, as well as the ability to zero in on an individual stream. We then combine this information we compiled as part of research report that indicates key strategies to focus on to get water to fish, and which streams contain the highest diversity of fish and other freshwater species. All of the data used in the analysis are open data and can be downloaded freely directly from the Drought Dashboard.

water data challenge

Tags:
Author(s):

California Safe Drinking Water

Type:

Model

[Project website](http://water.openoakland.org/)

water data challenge

Tags:
Author(s):

Call for transparency of COVID-19 models

Type:

Literature

A hallmark of science is the open exchange of knowledge. At this time of crisis, it is more important than ever for scientists around the world to openly share their knowledge, expertise, tools, and technology. Scientific models are critical tools for anticipating, predicting, and responding to complex biological, social, and environmental crises, including pandemics. They are essential for guiding regional and national governments in designing health, social, and economic policies to manage the spread of disease and lessen its impacts. However, presenting modeling results alone is not enough. Scientists must also openly share their model code so that the results can be replicated and evaluated.

modeling; open data; data management; research collaboration

Tags:
Author(s):
Marina Alberti

Central Valley Well Vulnerability

Type:

Model

Studying wells in the Central Valley in California. Additional resources: - [Project webpage](https://richpauloo.github.io/flexdash.html) - [Repository](https://github.com/richpauloo/cawdc)

water data challenge

Tags:
Author(s):

CloudBank

Type:

Project

The University of California, San Diego (UCSD)'s San Diego Supercomputer Center (SDSC) and Information Technology Services (ITS) Division, the University of Washington (UW)'s eScience Institute, and the University of California, Berkeley (UCB)'s Division of Data Science have developed and operate CloudBank, a cloud access entity that helps the computer science community access and use public clouds for research and education by delivering a set of managed services designed to simplify access to public clouds. Driven by the profound potential of the public cloud and the associated complexity in using it, CloudBank serves as an integrated service provider to the research community through a comprehensive set of user-facing and business operations functions. These services span the spectrum from novice to advanced cloud users, including front line user support, cloud solution consulting, training, and assistance in preparing proposals that include cloud resources. CloudBank provides innovative financial engineering options that give researchers more flexible cloud terms tailored for their needs and contribute to the sustainability of CloudBank operations. CloudBank helps NSF by bundling multiple small requests that come directly to NSF into a bulk request to cloud providers, dis-incentivizing more costly direct connections. Through this aggregation and innovative financial contract types, CloudBank passes along savings to researchers that would otherwise be unavailable to them.

COVID-19; health; community engagement; research collaboration

Tags:
Author(s):

CoMSES Network

Type:

Project; Repository

Welcome! CoMSES Net, the Network for Computational Modeling in Social and Ecological Sciences, is an open community of researchers, educators, and professionals with a common goal - improving the way we develop, share, use, and re-use agent based and computational models for the study of social and ecological systems. We develop and maintain the CoMSES Model Library, a digital repository that supports discovery and good practices for software citation, digital preservation, reproducibility, and reuse. We encourage you to join CoMSES Net and add your models to the archive.

COVID-19; sentiment analysis; social media

Tags:
Author(s):

CoVista: A Unified View on Privacy Sensitive Mobile Contact Tracing Effort

Type:

Literature

Governments around the world have become increasingly frustrated with tech giants dictating public health policy. The software created by Apple and Google enables individuals to track their own potential exposure through collated exposure notifications. However, the same software prohibits location tracking, denying key information needed by public health officials for robust contract tracing. This information is needed to treat and isolate COVID-19 positive people, identify transmission hotspots, and protect against continued spread of infection. In this article, we present two simple ideas: the lighthouse and the covid-commons that address the needs of public health authorities while preserving the privacy-sensitive goals of the Apple and google exposure notification protocols.

COVID-19; data security; health

Tags:
Author(s):
Nathan Pemberton

Coping with the California Drought: 2014-2016

Type:

Model

Our submission contains interactive visualizations that allow the user to explore how effectively California water utilities conserved water over the past 3 years, a critical period of historical drought. By visualizing and exploring the relationships between water conservation in 2014, 2015, and 2016 in almost 400 utilities from around the state, we provide an interesting perspective of the ongoing drought. For these visualizations, we define "conservation" and "water savings" as the difference in water production between a timeframe of interest in a given supplier's service area and its respective water production in 2013. This exploratory analysis includes statewide data to show overall water use trends, as well as data for each of the 10 hydrologic regions in California and individual utilities in those regions. The comparison between conservation efforts in 2014, 2015, and 2016 also provides insight on the responsiveness of different utilities to different incentives: a voluntary call for conservation in 2014, a state mandate in 2015, and the replacement of the state mandate by adjusted self-certified goals in 2016. Several key lessons can be drawn from these visualizations: (1) The reporting requirements put in place during the drought created an even platform for water utilities to keep track of important data, which in turn allows tools like this one to identify water use and conservation trends, drivers, and opportunities for enhanced water management at a variety of different scales. Further standardized tracking and reporting methods could facilitate the use of data for decision-making; (2) Water utilities collectively achieved significant water savings in the period between 2014 and 2016. While policies and regulations seem to have been significant drivers of water conservation throughout the state (e.g. higher water savings during the state mandate or in response to local watering restrictions), these visualizations show that water use and conservation are very site-dependent and utility-specific; (3) Many uncertainties remain about the human-water dynamics that made water savings possible between 2014 and 2016. A better understanding of local population behaviors towards water use, and responsiveness to different conservation incentives, could help water planners and managers tailor their conservation campaigns more effectively in the future, not only during drought, but also as a long-term water reliability strategy.

water data challenge

Tags:
Author(s):

DAMMS: Dam Assessment Mapping and Safety System

Type:

Model

Communicating dam safety through visualizations, building support for change Additional resources: - [Project webpage](https://dam-safety.github.io/damss)

water data challenge

Tags:
Author(s):

Dashboard on Efficiency and Bioassessment in Water Quality Enforcement

Type:

Model

Our entry is a dashboard that demonstrates the efficiency of enforcement actions relative to stream health at the county level. The dashboard allows a user to select a contaminant (or suite of contaminants) and a county in California to chart MCL exceedances by contaminant and county. The dashboard also presents CSCI stream condition scores for the selected County and the total dollar assessment for all enforcement actions in the county. Next, the dashboard calculates the percent of “intact” stream condition in the County as the average CSCI score divided by 0.92 (the intact threshold). Finally, the dashboard calculates the cost in dollars per CSCI unit as a measure of efficiency of enforcement relative to stream condition. Try selecting multiple chemicals. Compare different counties. Explore the MCL exceedances to see which constituents are the biggest problems. Enjoy!

water data challenge

Tags:
Author(s):

Data Science Education SIG - Should we create data science degree programs

Type:

Presentation

Is your institution considering starting a data science degree program? Come hear three ADSA community members discuss how their organization decided to create data science degree programs and how these programs took the shape they did. We'll be hearing from Sarah Stone (University of Washington), Ajay Anand (University of Rochester), and HV Jagadish (University of Michigan).

education; design

Tags:
Author(s):
Sarah Stone

Data Science Training and Collaboration: Online workshop on best practices in teaching data science to students and researchers

Type:

Event

Online workshop on best practices in teaching data science to students and researchers

education; community engagement

Tags:
Author(s):

DataOne

Type:

Repository

DataONE is a community driven program providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE promotes best practices in data management through responsive educational resources and materials. We envision researchers, educators, and the public using DataONE to better understand and conserve life on earth and the environment that sustains it.

data management; open data; research collaboration

Tags:
Author(s):

Demographic and Water Data Demonstration

Type:

Model

We set out to examine how water quality information informs the demographics of the areas around it. In particular, we considered whether the reported water quality and toxicity metrics correlate with the average income levels or ethnic makeup of a region. We examined the records from the CEDEN Water Quality and Toxicity databases from August 1st 2012-July 31st 2015. The CEDEN data was married with the 2010 Census and income data by zip code. Each substance or metric tested for in the dataset was tested to see how well it correlated to five variables for that particular zip code: mean income; median income; percentage of the population that is White; percentage of the population that is Hispanic; and percentage of the population that is African American. Analytes such as Chlordane, Methoxychlor, Endrin Ketone and PBDE 140 showed very strong negative correlation with mean and median income as well as the percentage of the population that is White. Meanwhile strong positive correlations with the percentages that are Black or Hispanic. This suggests that these substances are found in higher levels in the lower income and/or predominantly Black and/or Hispanic neighborhoods.

water data challenge

Tags:
Author(s):

Domestic well vulnerability to drought duration and unsustainable groundwater management in California's Central Valley

Type:

Literature

Millions of Californians access drinking water via domestic wells, which are vulnerable to drought and unsustainable groundwater management. Groundwater overdraft and the possibility of longer drought duration under climate change threatens domestic well reliability, yet we lack tools to assess the impact of such events. Here, we leverage 943 469 well completion reports and 20 years of groundwater elevation data to develop a spatially-explicit domestic well failure model covering California's Central Valley. Our model successfully reproduces the spatial distribution of observed domestic well failures during the severe 2012–2016 drought (n = 2027). Next, the impact of longer drought duration (5–8 years) on domestic well failure is evaluated, indicating that if the 2012–2016 drought would have continued into a 6 to 8 year long drought, a total of 4037–5460 to 6538–8056 wells would fail. The same drought duration scenarios with an intervening wet winter in 2017 lead to an average of 498 and 738 fewer well failures. Additionally, we map vulnerable wells at high failure risk and find that they align with clusters of predicted well failures. Lastly, we evaluate how the timing and implementation of different projected groundwater management regimes impact groundwater levels and thus domestic well failure. When historic overdraft persists until 2040, domestic well failures range from 5966 to 10 466 (depending on the historic period considered). When sustainability is achieved progressively between 2020 and 2040, well failures range from 3677 to 6943, and from 1516 to 2513 when groundwater is not allowed to decline after 2020.

water; drought; climate change

Tags:
Author(s):
R A Pauloo

Drinking Water Contaminant Classification

Type:

Model

nan

water data challenge

Tags:
Author(s):

Drinking water vulnerability tool

Type:

Model

A web-based application, https://drinkingwatertool.communitywatercenter.org/, that puts key water data at the fingertips of the community, so members can explore, understand, and advocate for the safety of their water supplies, with a focus on wells and community water systems that serve populations less than 10,000.

water data challenge

Tags:
Author(s):

Drought Resilience of California’s Power Supply

Type:

Model

The challenge facing us was how to evaluate and present the statewide impact of the drought and water supply curtailments on the ability of the disparate fleet of California power plants to generate reliable electricity. The Governor’s Office recognized that the State Water Resources Control Boards curtailment of water supplies from the Delta and dropping groundwater levels could impact availability of water supplies. In order to meet this challenge Energy Commission staff developed a data driven results oriented approach to evaluate if and where power plants could be impacted and what could be done to mitigate impacts to the California power supply. To achieve this goal, we summarized water supply and water use data for all Energy Commission jurisdictional power plants and other large power plants (100 total), which rely on various water supplies for operation. Tables and reports were produced from the data which included information on the following: • The type of water supply (surface water, groundwater, or recycled water) and water supplier for each power plant. • The average water use of each power plant. • The energy generation capacity and average annual capacity factor (a descriptor of annual electricity output). We summarized these key attributes and presented them on a map to add the spatial dimension of water and power distribution. Layered on this data, we added DWR’s GIS data showing land subsidence from groundwater withdrawal. These became the tools we used for informed decision making regarding drought emergency actions. An unexpected result of this effort was the demonstration that the power supply had significant drought resilience already built in, which would allow focused efforts in just several specific areas. Initial development of these tools was cumbersome because of the methods used to manage data from the Energy Commission Quarterly Energy and Fuels Report program (QFER). Once management saw the results of the data compilation they embraced further use and analysis for environmental performance and energy policy reports. To further enhance data acquisition and quality, staff is also revising regulations governing data reporting from power plant operators that will allow us to better analyze water supply reliability and drought resilience.

water data challenge

Tags:
Author(s):

Dynamic Surface Water Monitoring (DSWM)

Type:

Model

Our team has developed a proof of concept project (1) Fully Open Sourced to assist in (2) Decision Support, Data Sharing, and Information Communication Tools. The objective with this proposal is to framework to implement a Same-time monitoring network to produce an endless number of advantages for purpose of water accountability. With this baseline network design discussed, collaborating agencies could plug-and-play working theories for optimal Water Resources Management practices with an ability to validate accuracy to a fine degree. In our proof of concept, we evaluation Shasta Dam station approximately 10 miles of river length north of the Keswick Reservoir Station. In order to compare the discrepancies between stations, we used the following two dataset sources: ô€€€ California Data Exchange Center (CDEC) – US Bureau of Reclamation, Department of Water Resources, and Water Quality ô€€€ National Water Quality Monitoring Council (NWQMC)– USGS NWIS and BioData, EPA STORET, and USDA-ARS STEWARDS through a single search interface Below is an outline of the technical report used for our standalone document submission. We have also attached the requested Power-point, and Working Excel Spreadsheet that has an output less powerful, but relatively similar to R-S

water data challenge

Tags:
Author(s):

EMP Water Quality Conditions Report (Water Quality Conditions in the Sacramento San Joaquin Bay-Delta and Suisun Bay)

Type:

Model

The Bay-Delta Monitoring and Analysis Section within DWR’s Division of Environmental Services is part of the Environmental Monitoring Program (EMP) that monitors water quality, benthic macro invertebrates, and phytoplankton at discrete stations in the Sacramento-San Joaquin Delta, Suisun Bay, and San Pablo Bay. The monitoring conducted by the EMP is mandated by State Water Resources Control Board (SWRCB) - Water Rights Decision 1641 for operation of the State Water Project. The EMP program was established in 1970 and static “hardcopy” Water Quality Conditions reports have been published containing data from 1970 to 2011. This project, developed by 34 North was initiated to make the mandated data more accessible and usable in a web-based Interactive Water Quality Conditions Report Portal. The SWRCB’s D-1641 mandates DWR to: 1) conduct a comprehensive environmental monitoring program to determine DWR’s compliance with water quality standards and 2) annually report on the data collected. Development of a web-based Interactive Water Quality Conditions Report was initiated by DWR and the State and Federal Water Contractors (SWFCA), as part of the SWRCB “My Water Quality Portals”. This new web-based report intends to replace the publication of the hard-copy version of the WQC report and will improve compliance requirements for the D-1641 Monitoring Report. DWR’s last WQC report for 2011 was published in 2012 and this new format will provide the following improvements: • Real time reporting and updates available to the public in an easy to use web interface. • Reduce staff time required to develop these reports and allow more time for data analysis and problem solving. • Make all EMP data available and useful for scientists and managers to respond to recent court decisions, biological opinions, and mandates for drought responses. • Trend analysis. • Program data will be made available via web services for other water quality programs and scientific efforts.

water data challenge

Tags:
Author(s):

Endora

Type:

Model

Endora is a AI Cognitive System to make a symbiosis between humans, machines and nature, to monit the climate change on real time and follow the parameters monitored. Endora reads your questions in natural language, Endora follows information from multiple indicators, dispersed datasets, files, research documents, internet, makes simulations, inductions, analysis, deductions.

water data challenge

Tags:
Author(s):

EquiTensors: Learning Fair Integrations of Heterogeneous Urban Data

Type:

Literature

Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated representations, called EquiTensors, from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on convolutional denoising autoencoders to learn shared representations. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial learning to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show that EquiTensors could help mitigate the effects of the sensitive information embodied in the biased data. Meanwhile, applications using EquiTensors outperform models that ignore exogenous features and are competitive with "oracle" models that use hand-selected datasets.

AI; machine learning; social impact

Tags:
Author(s):
An Yan

Exploring climate change through RainSphere

Type:

Model

A new user-friendly climate data exploration tool called RainSphere (http://rainsphere.eng.uci.edu/), developed by the Center for Hydrometeorology and Remote Sensing (CHRS), is designed for use in a range of scientific studies and applications as well as to aid in the education of the general public and to promote independent inquisition and discovery of climate studies. Over 33 years of retrospective global precipitation estimates have been provided via the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks - Climate Data Record (PERSIANN-CDR) for visualization on RainSphere. In addition, projections of future precipitation from several carbon emission scenarios from the Intergovernmental Panel on Climate Change (IPCC) Coupled Model Intercomparison Project, Phase 5 (CMIP5) are also available for visualization with the CHRS RainSphere interface. Users are able to easily customize their investigations of historical precipitation estimates and future projections through automatically generated analysis products including time series, spatial plots, and basic trend analysis. Easy-to-use browsing capabilities allow the global data to swiftly be divided into regions of interest by country, political division (e.g. province/state), continental basin, major basin, tributary basin, watershed or a highly localized, searchable location. The tool automatically generates summarized reports with information about the selected area and also allows users to download any data and/or statistics extracted after browsing. CHRS RainSphere allows the data to speak for itself in a way that is easily understandable by the public in order to increase the number of informed participants in the conversation of climate and climate variability. RainSphere can be used to explore rainfall trends across spatial scales over California such as the severe drought in 2013. Lastly, our strategy to provide rainfall information to various user communities is to make it as painless as possible and to not leave users with the task of learning how to use the site by trial and error. For this reason, a six minute video tutorial is prepared and made available at https://www.youtube.com/watch?v=eI2-f88iGlY

water data challenge

Tags:
Author(s):

FAIR Digital Objects to Establish a Global and Interoperable Data Space

Type:

Event

We propose a double session at SciDataCon 2021 with a focus on challenges to organise the future research data spaces (a domain of data in which partners can interchange data in a secure and sovereign manner and in which data integration is being done when it is necessary) and beyond and how FAIR Digital Objects can help addressing these challenges.

digital objects; open data; data management; research collaboration

Tags:
Author(s):

Feeling Positive About Reopening? New Normal Scenarios From COVID-19 US Reopen Sentiment Analytics

Type:

Literature

The Coronavirus pandemic has created complex challenges and adverse circumstances. This research identifies public sentiment amidst problematic socioeconomic consequences of the lockdown,and explores ensuing four potential public sentiment associated scenarios. The severity and brutality of COVID-19 have led to the development of extreme feelings, and emotional and mental healthcare challenges. This research focuses on emotional consequences - the presence of extreme fear, confusion and volatile sentiments, mixed along with trust and anticipation. It is necessary to gauge dominant public sentiment trends for effective decisions and policies. This study analyzes public sentiment using Twitter Data, time-aligned to the COVID-19 reopening debate, to identify dominant sentiment trends associated with the push to reopen the economy. Present research uses textual analytics methodologies to analyze public sentiment support for two potential divergent scenarios - an early opening and a delayed opening, and consequences of each. Present research concludes on the basis of textual data analytics, including textual data visualization and statistical validation, that tweets data from American Twitter users shows more positive sentiment support, than negative, for reopening the US economy. This research develops a novel sentiment polarity based public sentiment scenarios (PSS) framework, which will remain useful for future crises analysis, well beyond COVID-19. With additional validation, this research stream could present valuable time sensitive opportunities for state governments, the federal government, corporations and societal leaders to guide local and regional communities, and the nation into a successful new normal future.

COVID-19; sentiment analysis; social media

Tags:
Author(s):
Yana Samuel

Get the lead out!

Type:

Model

Lead Levels in San Francisco Schools San Francisco Unified School District tested the school's drinking water for lead last year. This map shows the highest lead levels detected at each school based on multiple sample sites and dates. There is no safe level of lead, especially for children. Even small amounts can lower IQ. Damage from lead exposure is irreversible. Take action with CALPIRG's "Get The Lead Out:Back to School Toolkit." Additional resources: - [Project webpage](https://lobenichou.github.io/waterChallenge/) - [2019 Project](https://waterdatachallenge.github.io/project/get-the-lead-out-2019/)

water data challenge

Tags:
Author(s):

Guidance for Trustworthy Data Management in Science Projects

Type:

Literature

In April and May of 2020, the Trustworthy Data Working Group conducted a survey of the scientific community about data security concerns and practices. 111 participants completed the survey from a wide range of positions and roles within their organizations and projects, respectively. The working group analyzed the survey results with an eye for patterns, themes, correlations, and aggregates and produced a report in June 2020 detailing the process, survey methodology, and their analysis.

data security; survey

Tags:
Author(s):
Vahi, Karan

Hawaiʻi Climate Data Portal

Type:

Repository

The overarching goal of the HCDP is to provide streamlined access to high-quality reliable climate data and information for the State Of Hawai‘i. This includes the production of both near-real-time monthly rainfall and daily temperature maps and a user-friendly tool to visualize and download them. Easy access to high quality climate data, information and products through the HCDP allows researchers to focus more time on their analyses and less time on data collection and processing. It also provides the broader community with access to information that would otherwise be inaccessible due to technical limitations. Finally, centralizing data and information helps to create more of a holistic environment for environmental stewardship in Hawai‘i.

data management; climate change; data analysis; research collaboration

Tags:
Author(s):

How to Make Your Little Data Big by Being FAIR

Type:

Presentation

Connecting and tagging your digital information can make it reusable to others. Your research can have a greater impact and you might find new partnerships to further the science. How to do this? By implementing FAIR–a set of guiding principles to make data Findable, Accessible, Interoperable and Reusable. Building upon the SLC webinar Big Data: What Is It and What Does It Mean to Me? (April 2020), four speakers have been recruited to explain and translate how to use and apply FAIR principles across disciplines. An interactive discussion session is planned to close out the webinar to address your questions. The intent is to continue the conversation in a Breakout Session at the Future Tox V meeting scheduled for November 8 and 9, 2021.

data analysis; open data; data management

Tags:
Author(s):
Michelle Heacock

How to deliver translational data-science benefits to science and society

Type:

Literature

The translational aspects of data science — the analysis of big data — promise to benefit individuals, science and society. They stand to open up new lines of enquiry in computer science, statistics, ethics, data governance, cognitive psychology, organizational behaviour, information science, sociology and behavioural economics. With an overflowing treasure chest of big data, the time is ripe to tackle the crucial questions that can help translational data science to realize its potential.

open data; data management

Tags:
Author(s):
Vandana Janeja

Hows My Water?

Type:

Model

nan

water data challenge

Tags:
Author(s):

Identifying high-risk communities

Type:

Model

Identifying communities with a high risk of shortage using a multifactor vulnerability score

water data challenge

Tags:
Author(s):

Identifying the Central Figure of a Scientific Paper

Type:

Literature

Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a "central figure" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.

design; modeling

Tags:
Author(s):
Po-Shen Lee

It's Time for Data Ethics Conversations at Your Dinner Table

Type:

Literature

With 2.5 quintillion records of data created every day, people are being defined by how they travel, surf the Internet, eat, and live their lives. We are in the midst of a “data revolution,” where individuals and organizations can store and analyze massive amounts of information. Leveraging data can allow for surprising discoveries and innovations with the power to fundamentally alter society: from applying machine learning to cancer research to harnessing data to create “smart” cities, data science efforts are increasingly surfacing new insights ‒ and new questions.

data analysis; social impact

Tags:
Author(s):
Meredith Lee

Jata Water Solution

Type:

Model

Empowering Community Residents with One Integrated Information Platform and Two-Way Data Communication

water data challenge

Tags:
Author(s):

Just as Special - Colorado Foster Care Resources

Type:

Repository

We are building a vital bridge to address the foster care crisis by connecting families to the resources they need to thrive during their foster care journey. This database was created in partnership with Cobbled Streets.

social impact; community engagement

Tags:
Author(s):

Let's make it count virtual summit

Type:

Event

Let's Make It Count is a data science education initiative launched by the National Science Foundation West Big Data Innovation Hub at SXSW 2019, timed with the 2020 Census and in partnership with the U.S. Census Bureau's Statistics in Schools program. This material is based upon work supported by the National Science Foundation under Grants 1916573, 1916481, 1915774, as part of a national network of Regional Big Data Innovation Hubs.

education; census

Tags:
Author(s):

MIT COVID-19 Datathon

Type:

Project

The MIT COVID-19 Datathon is a week-long virtual event where teams of data scientists, clinicians, public health professionals and other subject matter experts come together to develop meaningful insights leveraging existing datasets to influence policy and decision making in the public and private sector.

COVID-19; data analysis; data competition

Tags:
Author(s):

Microsoft Cortana Intelligence

Type:

Model

Our project submission is a proof-of-concept entry that leverages Microsoft Cortana Intelligence to explore the themes and questions that have been presented for this challenge. My intent is to show how Cortana Intelligence can provide CA Water Board internal and external staff the tools to 1) Ingest and Explore Data in a Data Lake 2) Create Features and Models using Machine Learning 3) Deploy, Consume, and Operationalize models with web services and reports.

water data challenge

Tags:
Author(s):

NSF Big Data Innovation Hub Panel 1 - 2022 National Workshop on Data Science Education

Type:

Presentation

Recorded panel at 2022 National Workshop on Data Science Education

open data; data management; cyberinfrastructure

Tags:
Author(s):
Rajeev Bukralia

NSF Big Data Innovation Hub Panel 2 - 2022 National Workshop on Data Science Education

Type:

Presentation

Recorded panel at 2022 National Workshop on Data Science Education

metabolomics; data management; open data; research collaboration

Tags:
Author(s):
Renata Rawlings-Goss

National Metabolomics Data Repository

Type:

Repository

The National Institutes of Health (NIH) Common Fund Metabolomics Program was developed with the goal of increasing national capacity in metabolomics by supporting the development of next generation technologies, providing training and mentoring opportunities, increasing the inventory and availability of high quality reference standards, and promoting data sharing and collaboration. In support of this effort, the Metabolomics Common Fund's National Metabolomics Data Repository(NMDR), housed at the San Diego Supercomputer Center (SDSC), University of California, San Diego, has developed the Metabolomics Workbench. The Metabolomics Workbench serves as a national and international repository for metabolomics data and metadata and provides analysis tools and access to metabolite standards, protocols, tutorials, training, and more.

education; design; community engagement

Tags:
Author(s):

National and International Trends in Research Storage at Scale

Type:

Literature

This is the second in a series of concept papers outlining the function and role of the OSN in the research infrastructure landscape. Three national and international projects that serve research storage at scale are examined: EOSC-Nordic, which brings together research institutions, research infrastructure providers and policy makers, in the Nordic and Baltic region; HIFIS, which manages private cloud services for the Helmholtz Association in Germany; and FABRIC which is deploying its own network of compute and storage resources across the US, Asia and Europe.

education; design; community engagement

Tags:
Author(s):
Uwe Jandt

Native Fisheries (Baydeltalive.com: Native Fisheries Monitoring in the Sacramento San Joaquin Delta)

Type:

Model

The fisheries Monitoring dashboards were created to facilitate monitoring that informs real time water operations. The fisheries data is an aggregate of local, state, and federal efforts to monitor fish migration within and upstream of the Delta. The fisheries monitoring data is coupled with real-time hydrodynamic, real-time water quality and forecasted meteorological conditions. Data in this dashboard is assembled according to Reasonable and Prudent Alternatives (RPA) as outlined by Biological Opinions directed by NMFS NOAA and USFWS. The RPA’s mandate monitoring of state and federally threatened and endangered fish species (in addition to other species) and related environmental conditions. The dashboards are a decision support tool for the following groups: • Delta Operations for Salmon and Sturgeon (DOSS) is an interagency technical advisory team who closely monitors conditions to balance Delta Operations and protecting listed fish species. Through the Biological Opinions DOSS has a series of Triggers and Indices which alert fish migration and indicate a need for changes to Delta Operations. • The Delta Conditions Team (DCT) is comprised of local, state, and federal government representatives; water contractors; and NGO’s. The DCT meets weekly to review current conditions (fish, turbidity, flow, weather, temperature, and exports). • The Smelt Working Group (SWG) focuses on conditions related to delta and longfin smelt. The team reviews conditions and advises changes in Delta Operations to reduce smelt mortality. A key indicator for smelt migration is increases in turbidity in the Delta. • General Water Operations.

water data challenge

Tags:
Author(s):

Open Storage Network

Type:

Project

The Open Storage Network (OSN) supports science and scholarly research that requires data storage and transfer at scale, by simplifying and accelerating access to data that is in active use by ongoing research projects. The OSN places particular emphasis on large data (hundreds of terabytes) sets that are often difficult to share, and long tail data sets that are often difficult to find and access. Deployment of the OSN is a response to the increasing importance of storage as the third component of national cyberinfrastructure, complementing investments in computing and networks. While other uses may emerge over time, the OSN is intended initially to serve two principal needs: (1) facilitate smooth flow of large data sets between data and computing resources such as instruments, synthetic data projects, campus data centers, national supercomputing centers, and cloud providers; and (2) make it easy to expose long tail data sets to the entire scientific community. The OSN is a functionally and administratively coherent federation of storage systems, referred to as Pods, that reside at independent sites. The OSN design leverages well defined standards and APIs that accommodate local variation while ensuring uniform global behavior. This approach is intended to enable scaling to hundreds of pods with aggregate raw capacity of hundreds of petabytes.

cloud computing; data management; open data; research collaboration

Tags:
Author(s):

Open Storage Network - OSN Outcomes Update

Type:

Presentation

A review of the OSN project, key features, and ways to leverage it in your research.

cyberinfrastructure; data management; open data

Tags:
Author(s):
Melissa Cragin

Open Storage Network - Research Drivers and Capabilities

Type:

Presentation

What interfaces and capabilities are needed alongside storage fabric to enable data sharing? This seminar will focus on domain examples from partners using the OSN – from the earth sciences to metro science.

cyberinfrastructure; data management; open data

Tags:
Author(s):
Wolfgang Gerlach

Open Storage Network Retrospective & the Future of Distributed Storage for eInfrastructure

Type:

Literature

The OSN is a boon to computing on the network, particularly for the growing needs of mid-scale researchers, which while focusing on collaboration and open data, also need to acquire cyberinfrastructure for research, without necessarily having the budget for data-intensive infrastructure. OSN pods provide a solution that helps to fill a gap in high performance and big data applications, building in performance, flexibility, and cost effectiveness that is competitive with, or exceeds current solutions. The OSN is a viable data storage and sharing solution for research groups that span multiple institutions and cannot span institutional firewalls and security domains. Research projects will increasingly need an option like the OSN to support transdisciplinary and at scale science.

open data; data management; cyberinfrastructure

Tags:
Author(s):
Melissa Cragin

Open Storage Network: national data storage cyberinfrastructure for the 21st century

Type:

Poster

nan

cyberinfrastructure; data management; open data

Tags:
Author(s):
Melissa Cragin

Open Water Rate Specification (OWRS)

Type:

Model

Additional resources: - [Project webpage](https://github.com/California-Data-Collaborative)

water data challenge

Tags:
Author(s):

Oregon State Watershed-Riparian System Project

Type:

Project

The long-term goal of this project is to enhance production and ecological resilience in rangeland watershed-riparian systems by providing science-based information aimed to improve land management practices and to inform policy related to water quality and water quantity issues influencing land use-environment relationships.

ecological resilience; rangeland; water

Tags:
Author(s):

PFAS Analysis and Intervention

Type:

Model

Bringing awareness to and analyzing a newly researched class of chemicals, PFAS.

water data challenge

Tags:
Author(s):

PFAS– We Eat It, We Drink It, We Breathe It

Type:

Model

[Project Webpage](https://meldataaa.shinyapps.io/PFAS_Analysis_and_Intervention/)

water data challenge

Tags:
Author(s):

Pandemic Vulnerability Index of US Cities: A Hybrid Knowledge-based and Data-driven Approach

Type:

literature

Cities become mission-critical zones during pandemics and it is vital to develop a better understanding of the factors that are associated with infection levels. The COVID-19 pandemic has impacted many cities severely; however, there is significant variance in its impact across cities. Pandemic infection levels are associated with inherent features of cities (e.g., population size, density, mobility patterns, socioeconomic condition, and health environment), which need to be better understood. Intuitively, the infection levels are expected to be higher in big urban agglomerations, but the measurable influence of a specific urban feature is unclear. The present study examines 41 variables and their potential influence on COVID-19 cases and fatalities. The study uses a multi-method approach to study the influence of variables, classified as demographic, socioeconomic, mobility and connectivity, urban form and density, and health and environment dimensions. This study develops an index dubbed the PVI-CI for classifying the pandemic vulnerability levels of cities, grouping them into five vulnerability classes, from very high to very low. Furthermore, clustering and outlier analysis provides insights on the spatial clustering of cities with high and low vulnerability scores. This study provides strategic insights into levels of influence of key variables upon the spread of infections as well as fatalities, along with an objective ranking for the vulnerability of cities. Thus it provides critical wisdom needed for urban healthcare policy and resource management. The pandemic vulnerability index calculation method and the process present a blueprint for the development of similar indices for cities in other countries, leading to a better understanding and improved pandemic management for urban areas and post-pandemic urban planning across the world.

cities; social impact; COVID-19; modeling

Tags:
Author(s):
Md. Mokhlesur Rahman

Pittsburgh DataWorks DataJam

Type:

Project

The DataJam is an academic competition for high school students and afterschool programs, like the Boys & Girls Clubs, which focuses on teaching about the use of big data to answer a research question. The program is set up in such a way that students usually work in teams of 5-7 students to formulate a research question, find publicly available data sets, analyze their data, make data visualizations, and present their findings to a panel of judges. Students learn skills pertaining to the scientific method, data analysis, and how to give scientific presentations.

education; community engagement; data analysis

Tags:
Author(s):

Prediction and Visualization of Reported Units

Type:

Model

Saving staff time by building a model that identifies Electronic Annual Reports (EARs) with errors. Can we learn more about contributing factors through geographic visualization? Additional resources: - [Project webpage](https://github.com/ozzysChiefDataScientist/water)

water data challenge

Tags:
Author(s):

Preliminary Analysis of Surface Water Toxicity Dataset

Type:

Model

I found out about this data challenge rather late and just decided to take a stab at it for fun. The data set was fairly large, so my goal was initially to try to provide some sort of graphical representations of the data, by project / region / organism, etc. In the end due to time constraints, none of that happened, so the script only computes averages and standard deviations for the various analytes and organisms. The full dataset or a subset of the data is read in as a CSV file, but must first be modified to remove commas. I did this by just doing a search and replace in Excel, replacing commas with semi-colons, as I noticed many cells contained commas that would interfere with the way I split the cells in Python. There may be a workaround to this using a CSV package in Python, but I didn’t have the time to look into that. The script creates a mixture of classes and dictionaries, organizing data in several layers: by project, station name, organism name, and analyte. Averages and standard deviations are computed for each project by organism name and analyte, essentially combining the data from each station. These averages and standard deviations are written to a CSV results file and provide a very broad overview of the results of each project. Average and standard deviation data is also available on a more in-depth level by station name. Some projects involved the collection of samples with an initial and final reading appearing as separate entries; I did not have time to configure the script to treat these entries differently. I also did very little data validation in terms of ignoring bad data points (I saw some -88 values) and did not attempt to fix mis-parsed data (I have a project in my results named “7:30:00”). The script has several functions that compute date-time-values for entries, which could potentially be useful for chronologically sorting and plotting data, if a better method doesn’t exist already.

water data challenge

Tags:
Author(s):

Providing Context to a Proposed Shasta Dam Expansion

Type:

Model

California’s recent drought has placed unprecedented demands on our freshwater resources, renewing enthusiasm for surface water infrastructure investments such as raising dams to capture more water in wet years. Using historical data of storage levels and inflows of Shasta Dam, our group wanted to estimate how successful the Bureau of Reclamation’s expansion of Shasta Dam would be. We graphed past patterns of data and modeled potential future scenarios. Our entry is in the categories of data visualization and insights and Decision Support, Data Sharing and Information Communication Tools.

water data challenge

Tags:
Author(s):

Public Perceptions of COVID-19 Vaccines: Policy Implications from US Spatiotemporal Sentiment Analytics

Type:

literature

There is a compelling and pressing need to better understand the temporal dynamics of public sentiment towards COVID-19 vaccines in the US on a national and state-wise level for facilitating appropriate public policy applications. Our analysis of social media data from early February and late March 2021 shows that, despite the overall strength of positive sentiment and despite the increasing numbers of Americans being fully vaccinated, negative sentiment towards COVID-19 vaccines still persists among segments of people who are hesitant towards the vaccine. In this study, we perform sentiment analytics on vaccine tweets, monitor changes in public sentiment over time, contrast vaccination sentiment scores with actual vaccination data from the US CDC and the Household Pulse Survey (HPS), explore the influence of maturity of Twitter user-accounts and generate geographic mapping of tweet sentiments. We observe that fear sentiment remained unchanged in populous states, whereas trust sentiment declined slightly in these same states. Changes in sentiments were more notable among less populous states in the central sections of the US. Furthermore, we leverage the emotion polarity based Public Sentiment Scenarios (PSS) framework, which was developed for COVID-19 sentiment analytics, to systematically posit implications for public policy processes with the aim of improving the positioning, messaging, and administration of vaccines. These insights are expected to contribute to policies that can expedite the vaccination program and move the nation closer to the cherished herd immunity goal.

COVID-19; sentiment analysis; social media

Tags:
Author(s):
Md. Amjad Hossain

Purify

Type:

Model

Clean Water Discrimination: The Native American Water Crisis

water data challenge

Tags:
Author(s):

RRoCCET21

Type:

Event

Recordings from the RRoCCET21 Conference

open data; data management; cyberinfrastructure

Tags:
Author(s):

Research Drivers and Capabilities

Type:

Literature

This is the first of a series of concept papers outlining the function and role of the OSN in the research infrastructure landscape. Here we address three cases: The Terra Fusion project, which has produced a massive dataset by fusing data from multiple instruments; the SAGE project is working to streamline data flowing from Internet of Things (IoT) devices; and a team of water and hazards researchers who have developed a new access point for hurricane data that will facilitate new research.

cloud computing; cyberinfrastructure

Tags:
Author(s):
Melissa Cragin

Resilient Community Toolkit for Drought Response

Type:

Model

How might we help empower and activate communiites to respond to the California drought? The *Resilient Community Toolkit for Drought Response* is a solution that makes it easier for all stakeholders to leverage too often dispersed resources.

water data challenge

Tags:
Author(s):

Sacramento River Watershed Data Portal

Type:

Model

The Sacramento River is the largest river and watershed system in California. This 27,000-square mile basin drains the eastern slope of Shasta, the western slopes of the Cascades and the northern portion of the Sierra Nevada and runs 31% of the state’s total surface runoff. The Sacramento River Basin provides drinking water for millions, supplies farmers with water for California’s agriculture and is the lifeblood for hundreds of wildlife species including four runs of Chinook salmon. To help successfully manage this dynamic and diverse system, 34 North was tasked with a major data aggregation and software development effort in order to help inform stakeholders with accurate and timely information. The SRWP data platform is used for key management issues including: ü Collaborative resource management ü Salmon/steelhead passage and habitat ü Wild trout/native fish ü Forest health/fuels management ü Aquatic and riparian habitat ü Water quality ü Water supply ü Flood management ü Open space and land conservation ü Erosion and natural stream function ü Invasive species ü Temperature management

water data challenge

Tags:
Author(s):

Salinity Management (Managing Salinity in the Delta)

Type:

Model

The Bay-Delta Monitoring and Analysis Section within DWR’s Division of Environmental Services is part of the Environmental Monitoring Program (EMP) that monitors water quality, benthic macro invertebrates, and phytoplankton at discrete stations in the Sacramento-San Joaquin Delta, Suisun Bay, and San Pablo Bay. The monitoring conducted by the EMP is mandated by State Water Resources Control Board (SWRCB) - Water Rights Decision 1641 for operation of the State Water Project. The EMP program was established in 1970 and static “hardcopy” Water Quality Conditions reports have been published containing data from 1970 to 2011. This project, developed by 34 North was initiated to make the mandated data more accessible and usable in a web-based Interactive Water Quality Conditions Report Portal. The SWRCB’s D-1641 mandates DWR to: 1) conduct a comprehensive environmental monitoring program to determine DWR’s compliance with water quality standards and 2) annually report on the data collected. Development of a web-based Interactive Water Quality Conditions Report was initiated by DWR and the State and Federal Water Contractors (SWFCA), as part of the SWRCB “My Water Quality Portals”. This new web-based report intends to replace the publication of the hard-copy version of the WQC report and will improve compliance requirements for the D-1641 Monitoring Report. DWR’s last WQC report for 2011 was published in 2012 and this new format will provide the following improvements: • Real time reporting and updates available to the public in an easy to use web interface. • Reduce staff time required to develop these reports and allow more time for data analysis and problem solving. • Make all EMP data available and useful for scientists and managers to respond to recent court decisions, biological opinions, and mandates for drought responses. • Trend analysis. • Program data will be made available via web services for other water quality programs and scientific efforts.

water data challenge

Tags:
Author(s):

Scientific Data Security Concerns and Practices: A survey of the community by the Trustworthy Data Working Group

Type:

Literature

In April and May of 2020, the Trustworthy Data Working Group conducted a survey of scientific data security concerns and practices in the scientific community. This report provides an analysis of the survey results.

data security; survey

Tags:
Author(s):
Vahi, Karan

Smart Water Analytics

Type:

Model

Smart Water Analytics is the industry leading innovative & advanced technology solution leveraging some of open sources software combined with Accenture’s knowledge and experience in the water industry. Smart Water solution is not only helping States, Cities, Municipalities & counties in North America but also some of the leading water supplier across the globe in water conservation and to provide safe drinking water Our solution focuses on Ground Water and some of the salient features are mentioned below: Technology: -Built on Cloud technology & leveraging some of the open software like Hadoop, Spark, Scoop, MySQL -Provides Analytics capabilities to look at data either in Dashboard or Map form -Leveraged “R” programing language to do comparison analysis of two or more counties for chemicals usage and number of wellsBusiness- Demonstrates capability to get answers to three (3) use cases in a very intuitive manner -Use Case 1 : Analyze trends of specific chemicals used in wells across the state level and deep dive at the well level -Use Case 2: Counties wherein the number of wells has increased -Use Case 3 : Attributing the increase in wells to increase in population or weather condition

Tags:
Author(s):

Socioeconomic factors analysis for COVID-19 US reopening sentiment with Twitter and census data

Type:

literature

Investigating and classifying sentiments of social media users (e.g., positive, negative) towards an item, situation, and system are very popular among researchers. However, they rarely discuss the underlying socioeconomic factor associations for such sentiments. This study attempts to explore the factors associated with positive and negative sentiments of the people about reopening the economy, in the United States (US) amidst the COVID-19 global crisis. It takes into consideration the situational uncertainties (i.e., changes in work and travel patterns due to lockdown policies), economic downturn and associated trauma, and emotional factors such as depression. To understand the sentiment of the people about the reopening economy, Twitter data was collected, representing the 50 States of the US and Washington D.C, the capital city of the US. State-wide socioeconomic characteristics of the people (e.g., education, income, family size, and employment status), built environment data (e.g., population density), and the number of COVID-19 related cases were collected and integrated with Twitter data to perform the analysis. A binary logit model was used to identify the factors that influence people toward a positive or negative sentiment. The results from the logit model demonstrate that family households, people with low education levels, people in the labor force, low-income people, and people with higher house rent are more interested in reopening the economy. In contrast, households with a high number of family members and high income are less interested in reopening the economy. The accuracy of the model is reasonable (i.e., the model can correctly classify 56.18% of the sentiments). The Pearson chi-squared test indicates that this model has high goodness-of-fit. This study provides clear insights for public and corporate policymakers on potential areas to allocate resources, and directional guidance on potential policy options they can undertake to improve socioeconomic conditions, to mitigate the impact of pandemic in the current situation, and in the future as well.

modeling; COVID-19; census; sentiment analysis; social media

Tags:
Author(s):
Xue Jun Li

Source Water Time Machine

Type:

Model

[Project website](https://jjspector.shinyapps.io/source_water_time_machine/)

water data challenge

Tags:
Author(s):

Spotting the Drought

Type:

Model

[Project Website](https://californiadrought.shinyapps.io/WaterWells/)

water data challenge

Tags:
Author(s):

Statewide Interactive Map of Lead in Drinking Water

Type:

Model

Lead Levels in San Francisco Schools San Francisco Unified School District tested the school's drinking water for lead last year. This map shows the highest lead levels detected at each school based on multiple sample sites and dates. There is no safe level of lead, especially for children. Even small amounts can lower IQ. Damage from lead exposure is irreversible. Take action with CALPIRG's "[Get The Lead Out:Back to School Toolkit](https://calpirg.org/resources/caf/get-lead-out-back-school-toolkit)." Additional resources: - [2018 Project](https://waterdatacollaborative.github.io/project/get-the-lead-out/) - [Project webpage](https://geosurge.github.io/get-the-lead-out-map/)

water data challenge

Tags:
Author(s):

Stream Monitor App

Type:

Model

Our project is the Stream Monitor App, which allows people to “subscribe” to specific rivers, and revive alerts when the river is forecasted to exceed a high or low-flow threshold. The app makes use of the 10 day river forecasts coming from the National Weather Service’s new National Water Model as its main data source. When users open the app, they are presented with a map of all the nation’s rivers, colored according to whether they are currently above or below their average flow for the present month. When users subscribe to a river, they can set various alert levels, and the app checks the National Water Model every hour, triggering an alert if the flow is forecasted to cross of of those levels at any point in the next ten days. This app allows concerned citizens to monitor the streams they care about, and understand the flow thresholds that define their hydrology. For example, if a user observes that a given stream has become too low for salmon to pass, they can use the app to check what flow rate corresponds to that threshold. They can check if the conditions are expected to persist or ameliorate, and set an alert so that next time the river is forecasted to drop down this low, they will receive 10 days warning. Water managers can use the app to examine forecasted water availability for withdrawals, dam operators can use it to forecast inflows and manage releases, and recreational water users can plan their next kayaking or fishing trip for when conditions are best.

water data challenge

Tags:
Author(s):

SuAVE: Survey Analysis via Visual Exploration

Type:

Project

SuAVE (Survey Analysis via Visual Exploration) is a new online platform for visual exploratory analysis of surveys and image collections. It integrates visual, statistical and cartographic analyses and lets users annotate and share images and distribution patterns. It also provides a gateway into advanced data science and machine learning tools by integrating with R and Jupyter notebooks.

survey; data analysis

Tags:
Author(s):

Surface Moisture Monitoring in Real-Time and Large Scales

Type:

Model

Our innovation is a real-time data collection method for field-scale (and larger) sensing of soil moisture, snowmass, and vegetation that may be used to ground-truth, calibrate, complement, and augment the current methods of estimating surface water data at these scales -- critical inputs for water markets and implementation of the SGMA, including the soon to be unveiled OpenET (evapotranspiration) platform which uses estimates from satellite imagery, not proximal measures.

water data challenge

Tags:
Author(s):

Sustainable Floodplain Habitat Finder

Type:

Model

Chinook salmon in California's Central Valley are struggling to survive. During their epic migration from the ocean into the tributary streams draining the Central Valley, salmon contend with depleted streamflows, migration barriers, predators, degraded habitat, lack of nutrients, and a host of other challenges. Baby salmon are especially vulnerable during their long migration from the spawning grounds downstream to the ocean. Additional floodplain habitat may be the key to restoring dwindling salmon populations; by providing food, cover from predators; and higher rates of survival. But - restoring floodplain habitat requires a careful balance of current and future streamflow conditions, groundwater basin conditions, and, of course, baby salmon migration patterns. Our entry is an open source combination of data visualization and decision support tools for water resources and fishery managers who must constantly make difficult decisions about how to allocate streamflows to meet a wide range of human and ecosystem needs. Our tool is an R Shiny application that incorporates the Leaflet map service, Plotly charting tools, the National Oceanic and Atmospheric Administration River Forecast web service, and R-based statistical evaluation interfaces to evaluate, in a real-time data-driven way, the relative potential for floodplain habitat creation at a given site.

water data challenge

Tags:
Author(s):

Tableau: Water Acidity

Type:

Model

Our entry will consist of three different visualizations created with Tableau. The Viz’s will be accessible through the internet on our company server and the presentation will have emphasis on expressing the abilities tableau has to offer. With data found on the Open Green Gov website and other public websites, we will show how tableau can illustrate water pH levels found in California. We will also show how water is becoming more acidic over time using data from Hawaiian oceans and explain some implications this could bring to our planet in the future.

water data challenge

Tags:
Author(s):

Team Athena Intelligence

Type:

Model

Our submission is a non-opensource Data Visualization and Insight that combines data from dozens of agencies from various public sources to visualize the connections between water sources (reservoirs, dams), water distribution (aqueducts, canals, rivers & streams) and water utilization (hydro electric facilities). In the dashboard, the data can be "pivoted" by clicking on various spatially displayed data. Click on the hydro electric facility and the specific reservoirs, canals and streams associated with the facility are displayed, along with month to month data related to storage status of the associated reservoirs.

water data challenge

Tags:
Author(s):

The California Water Planning Information Exchange (Water PIE)

Type:

Model

Based on Data Basin technology, the Conservation Biology Institute (CBI) recently completed a web-based data sharing and map visualization pilot called the Water Planning Information Exchange (or Water PIE) for the California Department of Water Resources. This web mapping platform was constructed around four primary goals – improve spatial data accessibility; allow for integration of datasets from disparate places; make the system easy to use; and support various forms of collaboration. The pilot project demonstrated that the platform supports time series data; for example, reporting data from stationary locations such as groundwater well readings. While the platform contains several novel technical approaches to online mapping and data sharing, it’s most unique strengths pertain to features that support a variety of social interactions. Working from personal accounts, users manage content in their own private workspaces that they can return to at any time. Users can query spatial datasets, load their own datasets, build new maps from the large library of authoritative data, download content, and share content with whomever they wish. They can easily create public or private working groups to address any topic of interest and numerous platform features (e.g., commenting on maps) allow for effective collaboration.

water data challenge

Tags:
Author(s):

The Dams of California

Type:

Model

The Dams of California is a data visualization web tool which provides the viewer with an interactive overview about the more than 2,000 non-Federal Dams in California. The viewer can select from the menu's to determine the number of dams in a particular county, which of the 8 types there are, the storage capacity and which allow to see what dams are located in which county, they type of dam and who the owner is. Other information like the year built and the main tributary thus providing a complete picture of the various dams in California.

water data challenge

Tags:
Author(s):

The Data Science Corps: Making a Difference by Connecting Experts to Projects

Type:

Literature

A Report on the December 2017 Data Science Corps Workshop

education; design

Tags:
Author(s):
Vandana Janeja

The Data for Good Growth Map

Type:

Literature

From July 2020–April 2021, a network of Data for Good program organizers and those doing related work from 17 universities, including nine active Data for Good programs and four in development, met regularly to share their experiences and discuss practices These Data for Good organizers also participated in a survey that collected detailed information about their programs For more information about contributors, see the various Appendices and the List of Contributors (p 62) Aware that many university scholars are considering starting a Data for Good Program to meet the high demand for applied data science education in their own communities, we decided to share what we had learned together With support from the West Big Data Innovation Hub, a team from the University of Washington’s eScience Institute distilled the insights generated through group discussions and survey results to produce a series of “growth maps ” Each growth map highlights key decision points to consider when designing a Data for Good program By elaborating on these high-level decision points, we hope to assist “seedling” programs interested in charting their own plan for growth.

education; design

Tags:
Author(s):
Sarah Stone

The Innovation Playbook for Local Government

Type:

Presentation

Technological forces are moving at an unprecedented pace, impacting everything from healthcare to mobility. How can local governments establish effective command and control structures to manage the onslaught in a manner that works best for their constituents? CTIO of ICMA, and former CIO at the U.S. Department of Labor, Xavier Hughes will show how innovation is as much a playbook as it is a platform to propel success and prosperity across people, process, policy data, and technology ventures.

social impact; policy; data management

Tags:
Author(s):
Xavier Hughes

The Open Storage Network: Distributed Storage Cyberinfrastructure for Data-Driven Science

Type:

Poster

nan

cyberinfrastructure; data management; open data

Tags:
Author(s):
Melissa Cragin

The Road for Recovery: Aligning COVID-19 efforts and building a more resilient future

Type:

Literature

Our society currently faces the most profound and deeply disruptive public health crisis in modern history. As communities across the world grapple with the COVID-19 pandemic, scientific advances spanning biochemistry and epidemiology to manufacturing and data engineering offer hope—and a spectrum of guidance is unfolding in an effort to respond to monumental shifts in our daily lives. The rising demand for data and the emerging efforts to responsibly collect, share, and analyze information across traditional boundaries play a vital role in our next steps.

COVID-19; modeling; data management

Tags:
Author(s):
Meredith Lee

The Urban Drool Tool

Type:

Model

The main goal of this proof of concept project submittal would be to integrate water use data (Moulton Niguel Water District) with flow and water quality data collected at stormdrain outfalls (OC Public Works) to help prioritize catchments where a condition of unnatural water balance and flow regime may exist as a result of anthropogenic sources during dry weather. It is anticipated that integration of this data will also help inform selection of effective strategies, which will be especially helpful for south Orange County (south OC), where unnatural water balance/flow regime has been identified as a high priority water quality condition within the water quality improvement plan (WQIP) that is currently under development for the south OC watershed management area. This entry is in the Open Source / Data Visualizations and Insights category and would be used as a Decision Support, Data Sharing and Information Communication Tool.

Tags:
Author(s):

The Watershed Analyst

Type:

Model

The Watershed Analyst is an online graphical interface that makes a complex and powerful dataset useful for water and biodiversity resource managers. The Watershed Analyst enables improved understanding of changes in climate and hydrologic patterns in watersheds for purposes such as: evaluating threats to water supply and biodiversity, planning for future extremes (drought and flood events), and prioritizing watershed restoration action. With the Watershed Analyst users can compare modeled future impacts of climate change to historic patterns for a chosen watershed and across watersheds, and create dynamic graphs and summaries to use in reports and projects. Graphical outputs include: 1) an annual Time Series Graph that allows exploration of annual patterns as well as longer-term trends; 2) a dynamic “Water Balance Diagram” that presents all components of the water balance to deepen understanding of watershed function on an annual basis or broader time scales; and 3) monthly plots of individual variables for direct comparisons of historical and projected future conditions. The graphs and data can be downloaded for further custom analyses or aggregation into larger watersheds by advanced users. The Watershed Analyst is currently in beta release focusing on the San Francisco Bay Area, with the plan to refine and expand the tool for hydrologic California. The underlying data set is the California Basin Characterization Model (BCM), a hydrologic model developed by the USGS California Water Science Center that calculates water balance on a monthly time step using precipitation, temperature, topography, soils, and bedrock geology. 270m resolution outputs include the components of a watershed's water balance: surface runoff, groundwater recharge, actual evapotranspiration, soil storage, and climatic water deficit. 18 climate futures are available, chosen to represent the full range of responses from more than 100 climate model and emission scenario combinations. The BCM is being used by numerous researchers and managers across California as a basis for evaluating the impacts of projected change and as an input to biogeographic modeling, including state agencies such as California Department of Water Resources and California Department of Fish and Wildlife, and regional resource managers such as Sonoma County Water Agency and US Fish & Wildlife Service Refuges. For more information on the BCM, see http://climate.calcommons.org/bcm.

water data challenge

Tags:
Author(s):

The WeTap App

Type:

Model

[Project Webpage](http://wetap.org/)

water data challenge

Tags:
Author(s):

Transboundary Groundwater Resilience

Type:

Project

This AccelNet design-track project creates a new international network of networks that connects U.S. and international networks of hydrology, social science, data science, and systems science to establish a novel transboundary groundwater resilience research approach. These new linkages will enable the infrastructure for transboundary groundwater resilience research and deliver conceptual advances that the next generation of water scientist leaders can build on in the future. The novel approach posits that identifying key data and system drivers would make significant progress towards producing research that catalyzes transformative change for transboundary groundwater systems.

water; international collaboration; research collaboration

Tags:
Author(s):

UW Data Science for Social Good

Type:

Project

The eScience Institute empowers researchers and students in all fields to answer fundamental questions through the use of large, complex, and noisy data. As the hub of data-intensive discovery on campus, we lead a community of innovators in the techniques, technologies, and best practices of data science and the fields that depend on them.

education; social impact; community engagement

Tags:
Author(s):

Using Smart Drones With Artificial Intelligence To Keep Our Water Safe

Type:

Model

Automated water quality monitoring, proactive alerts, and forecasting must be efficient and also cost-effective using drones and artificial intelligence that work tirelessly and continuously, 24/7.

water data challenge

Tags:
Author(s):

Visualization and Analysis for Water Resources Time-Series Data

Type:

Model

Today’s technology facilitates rapid data collection and the ability to store copious amounts of data. This data, which includes multi-variate attributes and can span many years, contains a wealth of information, but transforming this vast amount of data into actionable results can be problematic. Most 2D and 3D visualization techniques are adequate for basic analysis but fall short when one needs to analyze both conspicuous and subtle data patterns over time. These expansive datasets can be visualized by harnessing the power of the unique raster time-series visualization method. This method reveals trends across the entire temporal spectrum, from seconds to weeks to years. By viewing water resource datasets through this new lens, a more thorough understanding of historic, current and future water resource challenges is achieved. The following project demonstrates the benefits of the raster time-series visualization approach.

water data challenge

Tags:
Author(s):

Visualizing Exceedance of Contaminants

Type:

Model

Visualizing exceedance levels of water system contaminants. Additional resources: - [Project webpage](https://aaronhans.github.io/water-challenge/html/index.html)

water data challenge

Tags:
Author(s):

WILLOW AGEP Alliance

Type:

Project

The WILLOW Alliance for Graduate Education and the Professoriate initially collaborated with University of Montana in Missoula, Salish Kootenai College in Pablo, Montana, and Sitting Bull College in Fort Yates, North Dakota, to develop, implement, and study a model for the professional success of faculty and instructional staff in science, technology, engineering, and Mathematics (STEM) who are enrolled in, and/or descendants of, Native American tribes. The current WILLOW Alliance project with UM and SKC is funded by the National Science Foundation and aims to increase success of Native American STEM Faculty and advance knowledge about issues impacting their career progression in STEM fields.

data analysis; water

Tags:
Author(s):

Walker Basin Hydro Mapper

Type:

Project

The Walker Basin Hydro Mapper provides a basin-wide perspective of real-time streamflow and stage as well as lake and reservoir storage capacity for the Walker River Basin in Nevada and California. This tool was developed to create a common operating picture for water users in the Walker Basin and to help monitor changes to instream flows associated with the Walker Basin Restoration Program.

COVID-19; data management; social impact

Tags:
Author(s):

Water Quality Portal for California

Type:

Model

[Project webpage](https://caccr.github.io/)

water data challenge

Tags:
Author(s):

Water Rate Structures– Do They Make Cents?

Type:

Model

[Project Website](https://ianartmor.github.io/ca_owrs/)

water data challenge

Tags:
Author(s):

Water Resource Visualization and Decision Support Tool Proof of Concept Design

Type:

Model

We designed and prototyped an integrated water data modeling, visualization, and decision support tool as an example of how such tools can: • Enhance understanding of current and projected state-wide water demand in various scenarios • Identify possible water conservation and supply tactics to minimize risks to the agricultural industry • Improve stakeholder engagement and communication • Support water policy planning and decision making While our prototype demonstrates effective dashboard functionality and design using placeholder data, we’ve also outlined related datasets and a rapid prototyping approach that could be used for further development.

water data challenge

Tags:
Author(s):

Water Risk Monetizer

Type:

Model

Demand for water is quickly outpacing supply, and demand for water, food and energy continues to increase. To be sustainable, business will need to find ways to operate – and grow – while dramatically reducing reliance of natural resources. The Water Risk Monetizer provides actionable information to help water users understand the impact of water scarcity to their business and quantify those risks in financial terms to inform decisions that enable growth. The Water Risk Monetizer provides directional guidance in the form of a risk-adjusted cost of water, a monetary estimate of the full value of water at a facility level based on what water would cost if supply and demand were accurately reflected. The Water Risk Monetizer also examines the potential revenue at risk to a facility, estimating the amount and likelihood of the revenue that could potentially be lost at a facility due to the impact of water scarcity on operations. Applicable to any industry, the Water Risk Monetizer builds the business case for water stewardship. The tool can be used by business, agricultural and public water users to assess risk, enhance enterprise strategy or improve sustainable performance at a site level. Providing water users with an understanding of the full value of water will support investments to reduce water use and improve water reliability throughout California. This entry is in the non-open source / decision support, data sharing and information communication tools category.

water data challenge

Tags:
Author(s):

Water Scarcity Clock

Type:

Model

[Project Website](https://www.worldwater.io/)

water data challenge

Tags:
Author(s):

Water Supply App

Type:

Model

The project title is Water Supply App. It uses a json object created from the State Water Resources Control Board's Water Rights data (http://www.waterboards.ca.gov/waterrights/water_issues/programs/ewrims/index.shtml), USGS StreamStats data, and USGS 1:1,000,000-scale streams to facilitate the creation of a water supply report. The latter two datasets are accessed via USGS APIs (streamstats and tracer). Although the base mapping service is through esri's javascript api, no services were used that required payment to them. The app combines the steps that staff in the Division of Water Rights take to create a water supply report, which is part of the water availability analysis performed for new appropriative water right proposals. The user selects the proposed point of diversion and USGS Tracer finds the downstream flowline to the Pacific Ocean. The user then selects which existing water rights near the flowline to include in the analysis. The app delineates a watershed at each existing water right diversion point and generates a table that shows two pieces of information the user would otherwise have to calculate in Excel: the area of each watershed, and the sum of all existing water right diversions within that watershed. We think this has potential to save staff significant time in calculating water supply, especially if riparian rights' statements of diversion and use could be included in the SWRCB dataset. This demo version filters water rights for active appropriative rights, but future versions could offer more filter functionality, such as the diversion season. Also, this demo is strictly a front-end project, but ideally the server would send the finished table to the user for download so that the rest of the water availability analysis can be completed.

water data challenge

Tags:
Author(s):

Water Vulnerability in New Mexico

Type:

Model

Visualizations that quantify and identify the degree to which communities are vulnerable to threats to their drinking water supply, and intersectionality of systemic socio-economic environmental disadvantages.

water data challenge

Tags:
Author(s):

Water: Towards True Affordability

Type:

Model

Communities that are economically disadvantaged are most vulnerable to high water prices, and safety and availability issues. This project combines information from the Public Water Services (PWS), US Census data and water utility rates to study the affordability of drinking water in California. To fill the gaps in the existing data and engage and empowered the communities most affected, a proof-of-concept implementation for a survey is included. Additional resources: - [Project webpage](https://cawc-towardstrueaffordability.netlify.com/)

water data challenge

Tags:
Author(s):

WaterBudget

Type:

Model

Water is a scarce and restricted commodity and it has to be managed as such. Inspired by personal finance applications such as Mint and Level, and powered by machine learning techniques, we have created a proof-of-concept application, WaterBudget, to assist water districts to manage and plan their water usage and conservation plans. By analyzing the historical water usage patterns in each district, WaterBudget predicts total monthly and annual water usage, and compares it to the set target. The water usage prediction allows water districts to be proactive and implement necessary water conservation measures to ensure reaching the water conservation targets. In the prototype we've developed, the targets set by the State (2015) were used, however, in practice, they could be set by local administrations too. In addition to budgeting and planning purposes, this application allows the residents to easily follow their respective water suppliers' performance in water conservation.

water data challenge

Tags:
Author(s):

We Need a (Responsible!) Data Science Rapid Response Network

Type:

Literature

During the COVID-19 pandemic, for better or worse, we are learning much about ourselves as a society. Sabina Leonelli (“Data Science in Times of Pan(dem)ic,” this issue) provides a broad, cogent, and thought-provoking reflection on much of what we have learned to date from a data science perspective. Her chosen device for doing so—through imaginaries of data use—is particularly effective (and refreshingly unfamiliar to most data scientists) in balancing a sense of both what has been and what might be. The imaginaries include a welcome framing of the pitfalls of two broad areas of data science contributions during the pandemic: population surveillance and predictive modeling. Rather than similarly focusing on “the ways in which the data science contributions to the pandemic response are imagined and projected into the future,” here we center and emphasize the question of appropriate mechanisms for facilitating, or ideally even optimizing, such contributions.

professional development; tribal engagement; education

Tags:
Author(s):
Meredith Lee

WeedsViz

Type:

Model

Delta Region Aquatic Invasive Plant Control Visualization [Project Website](https://junjunjd.github.io/WeedsViz/)

water data challenge

Tags:
Author(s):

Well-Known

Type:

Model

We combine census data with groundwater quality data to provide a high-resolution map that helps answer the question "how do social, economic, and employment factors impact groundwater quality?".

water data challenge

Tags:
Author(s):

West Valley College California Water Data Challenge

Type:

Model

It is our hope this helps tell the water story in some small way. This project highlights the following areas: Ground water well data Evapotranspiration Precipitation Reservoir Storage Critical Habitat loss due to Drought Steelhead and Stream Flow

water data challenge

Tags:
Author(s):

Where’s My Water? : Predicting Renewable Freshwater Resource & Filling in Crucial Data Gaps

Type:

Model

Develop a prediction system for and forensically reconstruct statewide renewable freshwater supply data, to help manage water risk in a timely manner - a system that is usable, dynamic in space & time, publicly-available-data-driven, reproducible, and time and resource efficient.

water data challenge

Tags:
Author(s):

Women in Data Science Datathon

Type:

Project

The WiDS Datathon is an initiative to provide a platform for data science enthusiast to learn, apply and hone their data science skills through the social impact challenges presented to them. Participants are trained and mentored by partners, ambassadors, and data enthusiasts.

social impact; education; data competition

Tags:
Author(s):

Workflowhub

Type:

Project

WorkflowHub is a registry for describing, sharing and publishing scientific computational workflows. The registry supports any workflow in its native repository. WorkflowHub aims to facilitate discovery and re-use of workflows in an accessible and interoperable way. This is achieved through extensive use of open standards and tools, including Common Workflow Language (CWL), RO-Crate, BioSchemas and TRS, in accordance with the FAIR principles.

data management; open data; research collaboration; workflows

Tags:
Author(s):

gspdrywells.com - a decision support tool to estimate well failure in critically overdrafted groundwater basins

Type:

Model

gspdrywells.com is an open source, data visualization web platform for understanding domestic well failure in the San Joaquin Valley across a range of groundwater level management scenarios, including minimum thresholds (MTs) set forth in groundwater sustainability plans (GSPs).

water data challenge

Tags:
Author(s):

iRain app for real-time rainfall observations from satellites and crowdsourcing

Type:

Model

iRain provides access to real-time global high-resolution (~4km) satellite precipitation products from the PERSIANN-CCS (Precipitation Estimation from Remotely Sensed Information using the Artificial Neural Networks - Cloud Classification System), which has been developed by the research team at CHRS. The building block of iRain rests on the satellite precipitation estimates generated by the PERSIANN algorithm which has been under development for over two decades. More detailed information about PERSIANN-CCS can be found at http://chrs.web.uci.edu. iRain allows users to visualize real-time global satellite precipitation observations and track extreme precipitation events globally. It is especially useful for monitoring extreme events related to atmospheric rivers affecting California. Users can also use the crowdsourcing functionality of the app to report their local rainfall information to supplement our data.

water data challenge

Tags:
Author(s):

map.waterauditca.org

Type:

Model

This web map (under active development) combines information about NID-regulated reservoirs with post-API processed data from USGS stream gauges (future features are discussed). This will allow our organization (and potentially regulators) to monitor reservoir operators' compliance with FGC 5937. The application is a web-based version of desktop software already used by the organization to advance ecological conservation efforts. Weaknesses of existing stream flow measurement infrastructure is illustrated, and the application addresses these issues by only designating gauges as a "active" if data is reported. The significance of accurate stream flow measurements to ecological conservation is discussed. This is a simple data visualization which provides insight into water resource monitoring in the State, and will be used by our organization to support decisions regarding future litigation. Future data sharing objectives are discussed, but largely concern filling information gaps identified through the application or publishing previously-unpublished public records. The datasets implemented in this Beta are the National Inventory of Dams (Army Corps of Engineers) and the USGS Instantaneous Values REST Web Service. (Datasets which informed development of this work but are not implemented include: NOAA's Essential Fish Habitat GIS Shapefiles, USGS's Hydrologic Unit Maps, the California Data Exchange Center, and the California DFG's BIOS View, among others.)

water data challenge

Tags:
Author(s):

sdwisard - an open source query tool for SDWIS

Type:

Model

The sdwisard R package is an easy-to-use, open-source query tool for California drinking water quality data in the SDWIS database that enables reproducible analysis.

water data challenge

Tags:
Author(s):
Logo of the National Science Foundation

The West Big Data Innovation Hub is supported by the National Science Foundation through awards #1916573, 1916481, and 1915774. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Learn more about the NSF Big Data Hubs community here.

bottom of page