R. Clifton Bailey Statistics Seminar Series
Transfer Learning with Applications in Genomics
Hongzhe Li
Perelman Professor
Department of Biostatistics, Epidemiology and Informatics
University of Pennsylvania
Date: Friday, February 16, 2024
Time: 11:00 A.M. – 12:00 P.M. Eastern Time
Location: Nguyen Engineering Building, Room 1109
The seminar talk is also live-streamed. Please register here to receive the link.
Abstract
This talk considers estimation and prediction of high-dimensional linear regression models for transfer learning, using samples from the target model as well as auxiliary samples from different but possibly related models. When the set of ``informative" auxiliary samples is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. When sample informativeness is unknown, a data-driven procedure for transfer learning, called Trans-Lasso is proposed, and its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer is established. A related method, Trans-CLIME is developed for estimation and inference of high-dimensional Gaussian graphical models with transfer learning. Several applications in genomics will be presented, including prediction of gene expressions using the GTEx data and polygenetic risk score prediction using GWAS data. It is shown that Trans-Lasso and Trans-CLIME lead to improved performance in gene expression prediction in a target tissue by incorporating the data from multiple different tissues as auxiliary samples.
About the Speaker
Dr. Hongzhe Li is Perelman Professor of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine at the University of Pennsylvania. He is Vice Chair for Research, Director of Center of Statistics in Big Data and former Chair of the Graduate Program in Biostatistic at Penn. He is also a Professor of Statistics and Data Science at the Wharton School. Dr. Li has been elected as a Fellow of the American Statistical Association (ASA), a Fellow of the Institute of Mathematical Statistics (IMS) and a Fellow of American Association for the Advancement of Science (AAAS). Dr. Li served on the Board of Scientific Counselors of the National Cancer Institute of NIH and regularly serves on various NIH study sections. He served as Chair of the Section on Statistics in Genomics and Genetics of the ASA and Co-Editor-in-Chief of Statistics in Biosciences. Dr. Li’s research focuses on developing statistical and computational methods for analysis of large-scale genetic, genomics and metagenomics data and theory on high dimensional statistics. He has trained over 50 PhD students and postdoctoral fellows.
Event Organizers
Nicholas Rios
David Kepplinger