Guest Blogpost By: Andreas Prlić, Peter W. Rose, Eric Deutsch, Jennifer Dougherty, Gustavo GlusmanOn February 9-10, we convened the genevariation3d.org/ workshop in Seattle, funded by an NSF Big Data Spokes planning grant: “Increasing collaborations in proteogenomics applications of genetic data”. The goal of the workshop was to explore the state of the field, connecting genetic variation and 3D protein structure, and to bring together some of the key researchers working on interpreting genetic variation data. The workshop consisted of a mix of talks, discussion sessions, and breakout groups. Twenty-five speakers provided a short (15 min) summary of their research.
One Theme, Diverse Participants and Diverse Research Areas The 45 participants were an international group of scientists. Geographically, most were from the United States, but several participants came all the way to the West Coast from the United Kingdom, Denmark, and Switzerland. The talks connected the workshop theme to diverse topics such as cattle genetics, RNA sequencing, Big Data technologies, how precision medicine can help with specific diseases, and finally, cancer research.
Emerging TopicsSeveral topics arose repeatedly in the talks:
Connecting variation to 3D structure is of particular interest for cancer research. About one third of the talks were on topics related to cancer.
Several groups are developing visualization software that can show genetic variation mapped onto protein-sequences and 3D structures. However, FAIR principles are not widely followed currently. There is clearly the potential for enhancing data and code sharing, and perhaps agreeing on a common Application Program Interfaces (APIs).
It is not yet well understood how alternative transcript splicing relates to changes in protein structure. There is a lack of experimental knowledge, and data that can relate genomics information to alternative transcript information are only slowly getting better. For example, not all alternative transcripts are transcribed with the same frequency and there are challenges with the short read length of some next-generation sequencing methods when trying to precisely identify an observed splice variant.
Open Challenges During one of the breakout-groups we discussed open challenges for the field:
One challenge is how to build tools that are accessible and useful for people that work with genetic testing data to provide meaningful knowledge. For example, if a tumor board discusses about 8-15 patients during one meeting, currently only about 1-3 of these patients have actionable variants. One key success for our Spoke would be, if as a result of our tools and data, it will become possible to provide more actionable knowledge for genetic counselors and medical doctors.
As already mentioned above, splice variants are also still a challenge. Our Spoke could enable new insights in this topic by making it easier to combine Mass Spectrometry results with genome sequencing results. As a consequence, it could be easier to use experimentally verified peptides to confirm splice-junction sites.
It is sometimes difficult to distinguish which variants are “real” or just technical noise.
An area of technical challenge exists around data sharing: data standards and standardized annotation pipelines are needed to facilitate data exchange and integration.