Beyond Finite Samples: Trustworthy Machine Learning Methods to Understand the Genetic Basis of Phenotyped Subtypes of Alzheimer’s disease

Date(s) - 02/07/2022
3:00 pm - 4:00 pm

Virtual via Zoom & projected in Communicore, C1-004

Haohan Wang, Ph.D., Language Technologies Institute, School of Computer Science, Carnegie Mellon University

Join Zoom Meeting:
Launch Zoom Meeting

Abs: The development of machine learning techniques has offered us a new opportunity to analyze complex structured neuroscience data at a large scale to unveil the pathology of some neurodegenerative disorders and to offer potential preventive and therapeutic strategies. However, a plain application of machine learning methods, especially the black-box-nature deep learning techniques developed in recent years, may result in plausible knowledge discovered through the model’s learning of spurious features or confounding factors, such as aging factors or batch effects. Therefore, the development of machine learning tools that can incorporate the knowledge of neuroscientists and geneticists to consider the data heterogeneity nature and counter the data idiosyncrasy confounding factors is of great importance.

In this talk, I will introduce a principled view that leads to robust machine learning methods that can learn the nature from the data while staying least influenced by confounding signals raised by the data collection idiosyncrasy with finite samples. The principled view leads to two concrete methods: one used to leverage neural networks to phenotype subtypes of Alzheimer’s disease from MRI imaging, aiming to learn representations that can generalize across data collections, and the other one used to enable linear methods to pinpoint the genetic factors of phenotyped Alzheimer’s disease, countering the influence of confounding factors such as population stratifications.

Bio: Haohan Wang obtained his Ph.D. from LTI, School of Computer Science in Carnegie Mellon University, where he works with Professor Eric P. Xing. His research focuses on trustworthy machine learning and computational biology, with applications devoted to understanding the genetic factors of Alzheimer’s diseases, supported by technical activities including statistical analysis and deep learning methods development, with a particular focus to analyze the data with methods least influenced by spurious signals such as confounding factors. He was recognized as the Next Generation in Biomedicine by the Broad Institute of MIT and Harvard because of his contributions in dealing with confounding factors with deep learning.