Bern Data Science Day established

Data science research is on the rise at the University of Bern. For an initial stocktaking of ongoing projects in various faculties, the first Data Science Day was recently held. Following the great success of the pilot edition, the event is now to be held annually.

(May 2021)

In March, the Center for Artificial Intelligence in Medicine (CAIM) was officially opened, and in April the Bern Data Science Day (BDSD) was held for the first time, a virtual meeting attended by close to 200 researchers from the fields of data science, machine learning and artificial intelligence.

"Data-intensive and simulation-driven research has enormous potential and is taking hold of more and more disciplines," says co-initiator of the BDSD, Prof. Dr. Christiane Tretter from the Institute of Mathematics. "The rapid developments in the three "Sciences of Data Science" - mathematics, statistics and computer science - and the activities of the Science IT Support, ScITS, at the Phil.-nat Faculty have clearly shown that there is a need for a scientific forum for these topics at the university." This was the reason to create the Bern Data Science Day together with Raphael Sznitman (ARTORG) from the Faculty of Medicine and the coordinator of ScITS, Sigve Haug.

„With more data and better computation, data analysis increasingly drives life-changing decisions” said Tamara Broderick, Massachusetts Institute of Technology (MIT) in her keynote speech at the event. More and more disciplines at the University of Bern are discovering data science, including biology, chemistry, pharmacy, medicine, space exploration, economics and the social sciences and humanities – all of them represented at the first Bern Data Science Day.

"We were overwhelmed by the response," says Mauricio Reyes, Head of Medical Image Analysis (MIA) at the ARTORG Center, of the number of participants, many of them with AI projects on medical challenges that are the basis for CAIM. MIA postdoc John Anderson Garcia Henao adds: „This was a great event to get to know colleagues and potential partners, share knowledge and get input on your research in the first project stages. I think we can learn a lot from each other, we can extrapolate methodologies and inspire each other. This kind of event really recharges the energies.”

"The goal of the Bern Data Science Day was to bring together previously separate scientific worlds across faculty boundaries," says Christiane Tretter. "What I enjoyed most was the diversity of what is already being done in research in this area at the University of Bern, and the enthusiasm of the mainly young researchers among the 170 or so participants - this aspect sometimes gets a bit short shrift in everyday university life!"

Corona pandemic, dialect research, transportation

Data science studies abstract structures - whether they have a background in physics, biology or medicine. Therefore, it is important that data experts work embedded in their respective application fields. In this way, data science can provide solutions to questions from a wide range of areas, such as which patient is likely to develop "long covid", how Swiss German dialects change over time, or how goods get from A to B in the most resource-efficient way. Three research examples at the University of Bern.

SNF Covid-19 Project: AI-multi-omics-based Prognostic Stratification of COVID-19 Patients in Acute and Chronic State (Insel Gruppe and University of Bern)

“During the current pandemic, physicians have to take important therapy decisions for several patients fast. But for each patient the virus causes different symptoms. In this project with the Radiology Department of the Inselspital we want to answer two questions: How severe is the current infection? And: How likely will a patient suffer from chronic lung damage? This is not an easy engineering task, because as data scientists we need a lot of data to train robust algorithms. But, as the disease is so new, there isn’t a lot of data available yet. This is why we will be working together with Yale University and the University of Parma to get around 2400 COVID-cases. We will apply a multi-omics approach combining medical imaging with lab and clinical data to yield an individualized risk assessment. We use Natural language Processing to parser the lab data into a mathematical representation and it will combine with radiomic features to classify lung lesions using Artificial Intelligence."

John Anderson Garcia Henao, Postdoc Medical Image Analysis lab, ARTORG Center

SNF Excellenza Project: Language Variation and Change in German-speaking Switzerland: 1950 vs. 2020 (University of Bern)

“My role in the project is that of the data scientist. Our challenge was to reduce the number of survey sites from which to study Swiss German dialects from over 600 to just 125. But we needed to make sure these 125 were representative of how the dialects evolved over the past 70 years. For this, we took a digitized subset of the original database, categorized variants and then calculated linguistic distance matrixes instead of the previously employed geographical distances. After that we were able to apply clustering procedures and appoint candidate survey sites. We took more than 100 linguistic items from the original questionnaire and represented their differences in a multidimensional space between all surveyed locations. Using Partitioning Around Medoids (PAM), we could make sure that a resulting central location in any given cluster was objective. Only then we added a linguistic qualitative check and socio-geographic check to see if the proposed center was justified despite the 70 years that have elapsed since the original study.”

Péter Jeszenszky, Postdoc Center for the Study of Language and Society

Optimal Transport Distance in a Machine Learning Context

“I have a scientific background in mathematics. Through my current work in a multidisciplinary environment, I got familiar with machine learning (ML), a domain that has a multitude of connections with mathematics. I found that Optimal Transport Theory (OTT), a tool that I have worked with in my PhD, is a promising approach to improve performance of neural networks. The first question on this matter was aiming to calculate the optimal transport routes of military goods from warehouses to different battlefields during the French Revolution. The theory that grew out of this ever since, proved to be valuable in several real-life applications in economics, physics, biology, meteorology, image processing and optics. Because many phenomena in these domains happen in an optimal way (minimizing efforts or maximizing benefits), to describe these in the abstract language of mathematics OTT can be used. ML algorithms often yield probability distributions which can be more or less accurate. If you want to measure how good your ML model is or how far your prediction is from the desired value, you can apply optimal transport distance. Although there are other tools to quantify the proximity of probability distributions, optimal transport distance has a certain stability that others lack. So there is a good chance that in specific neural networks optimal transport distance can perform better than others do.”

Kinga Sipos, mathematician at the Science IT Support, Institute of Mathematics

Euclidean vs. Wasserstein interpolation. Kinga Sipos’ project is situated in an interdisciplinary domain. Such projects can have a big potential as researchers open towards new domains. Moving out of classic scientific paradigms, they can observe and integrate new aspects into their main expertise, potentially even shaping new scientific fields. (© Kinga Sipos, Institute of Mathematics, University of Bern)