
2023 National Workshop on Data Science Education
The sixth annual National Workshop on Data Science Education will take place virtually and in person at UC Berkeley from June 20-23, 2023. UC Berkeley's Division of Computing, Data Science, and Society is leading the event with support from Microsoft and the West Hub. The workshop is for educators at all levels who are interested in data science education.

The conference will be held in person at UC Berkeley with an option to join online. In-person attendance will be limited to 75 people, and priority will be given to speakers and panelists. The online component will remain free and open to all. Registration has closed for in-person attendance, but registration for virtual attendance remains open.
A diverse range of academic institutions from around the nation will be represented at the workshop, including four-year universities and community colleges. Last year, educators from over 100 institutions came together and shared insights on creating a cohesive data science educational ecosystem for undergraduate students. The 2022 workshop featured two panels hosted by the Hubs. This year, the West Hub has organized two panels. Both will be held on the morning of June 22.
WEST HUB PANELS
Project-Based Experiential Learning about Data Science
This panel will provide an in-depth look at project-based experiential learning projects running at University of Pittsburgh and UC San Diego (DataJam), UC Berkeley (Data Science Discovery Program), and the University of Washington (Data Science for Social Good.)
DataJam: A mentored data science learning activity and competition that runs throughout each academic year
Judy Cameron, University of Pittsburgh
DataJam is a data science learning activity and competition that runs throughout each academic year to introduce, encourage and engage young people in data science. To date, DataJam has focused on high school-age youth, but plans are underway to expand to community colleges. DataJam is coordinated by Pittsburgh DataWorks, an educational 501c3, and was started in 2013 in Pittsburgh, PA. However, with support from the NSF Northeast and West Big Data Innovation Hubs it expanded nationally in 2021. Several factors contribute to the popularity and potential for widespread dissemination of this program. First, the youth themselves are able to choose the topic of their project so they can focus on learning data science via a project that most interests them and their community. Second, university students from across the country are trained as DataJam mentors and are available by videoconference to mentor teams. Third, a large depository of resources and a centralized website with information about the DataJam is freely available at pghdataworks.org, along with a monthly newsletter about the DataJam, keeping all participants up to date and coordinated nationally. DataJam mentors receive formal mentoring training, and this has been expanded to provide training on how to work in diverse communities including low income, urban and rural communities, immigrant communities, Native American reservations and the unhoused community. Since its inception DataJam has been supported by businesses and industry partners, who provide financial support, and whose data scientists serve as advisors and judges at the annual finale, when all DataJam projects are presented online. Students benefit from first-hand knowledge of how impactful data science is in a wide variety of fields, and businesses benefit from attracting the attention of youth with strong interests in data science.
Perspectives on What it Takes to Institute a National Program Locally
Salvatore Ferraro, Caldwell University
To develop a New Jersey hub for the DataJam, efforts have been undertaken to train mentors, recruit schools and develop business partnerships in New Jersey. Caldwell University became involved in training mentors and now offers a DataJam mentor course that coordinates with the original University of Pittsburgh course, thereby alleviating the need for new curriculum to be developed. Caldwell University also runs an annual STEM Teacher conference for K-12 teachers, and DataJam has been advertised through this mechanism. The healthcare pharmaceutical industry is well represented in New Jersey and several strategies have been used to interest them in supporting DataJam.
Discovery Program
Anthony Suen, UC Berkeley
The Data Science Discovery Program incubates and accelerates data science research by connecting UC Berkeley students to high impact academic, government, non-profit, and industry projects across the globe. Founded in 2015, Discovery has incubated many cooperated projects between passionate practitioners and highly trained students with over 2000 student researchers and 800+ research projects. Projects have tackled everything from climate change, social justice, public health, and digital humanities. You can check out many of the projects on our project page. The Discovery Program provides projects with technical support from its Data Science Discovery Consultant Program along with cloud computing resources. The Discovery Program allows students to engage in project research as early as their freshman and continue through their entire undergraduate experience.
Data Science for Social Good
Sarah Stone, University of Washington
Launched in 2015, the University of Washington’s Data Science for Social Good (DSSG) summer research and education program partners Student Fellows with Data Scientists from the eScience Institute and Project Leads from academia, government, and the private sector to find data-intensive solutions to pressing societal challenges. Previous projects have involved applying methods such as machine learning to socially imperative topics including public health, homelessness, disaster response and transportation. Keystones of the DSSG program include project-based discussions and training around data science ethics, human-centered design and stakeholder analysis, and partner collaboration. DSSG programs can effectively impact social good, develop productive cross-sector relationships, and provide “real world” data science training for students from diverse disciplinary backgrounds.

Building a National Pipeline for Project-Based Experiential Data Science
The second panel will describe our plan to leverage the strengths of the existing projects, detailed in the first panel, to create a connected data science learning pathway. This pathway will offer an opportunity for students to engage in experiential data science learning as they progress from high school through graduate school, with the goal of preparing them for a successful career built on a solid understanding of data science and data literacy.
Exploring Opportunities for a Data-Driven Workforce Pipeline
Ashley Atkins, West Big Data Innovation Hub
This presentation will discuss the importance of data-driven experiential learning for students within the context of workforce development. These opportunities critically equip students with not only data-driven skills but complementary experience in navigating data ethics and translational data communication. Additionally, the presentation will explore the possibility of a multi-institutional pipeline that would create new pathways for data-driven workforce development to meet pressing local and national needs. Pilot efforts are underway at UC Berkeley, UC San Diego, and the University of Washington.
DataJam: A National Model with Local Hubs
Catherine Cramer, San Diego Supercomputer Center, UC San Diego
Judy Cameron, Pittsburgh DataWorks
DataJam started as a local data science learning activity and competition in Western Pennsylvania, but during the COVID-19 pandemic it expanded nationally, responding to the interest at high schools across the country in providing enrichment activities in data science for their students. The expansion was feasible because all of the resources for DataJam were available on a centralized website; communication with teams was by email; and because mentoring of teams was easily transitioned to an online platform. However, soon it became clear that it would be ideal to train mentors and develop business partners in relatively close proximity to teams, and for this a national program with local hubs structure was developed. To date, hubs have been developed in Southern California and New Jersey.
Integrating Real World Data Science at Community Colleges, UCs and Beyond
Anthony Suen, UC Berkeley
The Discovery at UC Berkeley will be a catalyst to expand workforce training to community colleges across the State of California.The Discovery Program will work with UW and UCSD by scaling DataJam model for community college students, creating translational opportunities with Discovery Program and students in regional institutions, and setting up a project pipeline to and from DSSG. Already, we have examples of graduate mentors that have contributed to the pipeline, from supporting transfer students in data science to support long-term research projects with the Discovery Program, and finally supporting advanced Data Science for Social Good project over the summer.
DSSG Expansion: Increasing the Project Pipeline
Sarah Stone, University of Washington
In collaboration with the West Hub we convened a Data for Good Organizers Network consisting of leaders from programs similar to the UW Data Science for Social Good program. This network produced a “Growth Map” white paper for other organizations interested in developing these types of university-hosted summer programs. Each of our programs experiences a huge level of interest from students wanting to use their data science skills on impactful projects. We are interested to work with new partners to develop sister programs. These programs need projects that are at a level of maturity where multiple students can engage full-time for 10-14-weeks to move the work forward. The pipeline model will allow for movement of both projects and students from one program to the next, i.e. the network of universities running Data Science for Social Good programs can increase their pipeline of well designed social impact projects through sourcing projects through programs like Discovery. DSSG projects also have the potential to continue long term development by cycling the project Discovery for the academic year.
NSF Perspective on Project-Based Experiential Learning about Data Science Programs
Jennifer Noll, National Science Foundation (NSF)
This presentation will share insights from an analysis of data science education awards across NSF programs. Trends in the portfolio will be shared that provide insight into the directions the field of data science education research has taken. Examples of NSF funded projects that highlight project-based learning, mentoring, career pathways, and broadening participation through projects that are engaged with communities that are underserved in STEM will be discussed as well as how these projects are situated within the larger landscape of data science education projects. Through a better understanding of current trends as well as potential gaps in the portfolio the discussion will also pinpoint opportunities to grow the community, scale up successful projects, and identify potential new directions for the field.

PANELISTS
