Statistical Seminar
Speaker
Zilin Li 李子林
东北师范大学数学与统计学院
Organizer
Yunan Wu 吴宇楠 (YMSC)
Time
Mon., 14:00-15:00, Nov. 10, 2025
Venue
C548, Shuangqing Complex Building A
All-in-One Toolkit for Biobank-Scale Whole-Genome Sequencing Data Management and Analysis
Biobank-scale Whole-Genome Sequencing (WGS) studies are increasingly pivotal in unraveling the genetic bases of diverse health outcomes. However, managing and analyzing these datasets’ sheer volume and complexity presents significant challenges. We propose vcf2agds, an all-in-one toolkit that efficiently converts WGS data from Variant Call Format (VCF) format to the annotated Genomic Data Structure (aGDS) format, significantly reducing data size while supporting seamless genomic and functional data integration for comprehensive genetic analyses. Additionally, STAARpipeline equipped with the aGDS files enabled scalable, comprehensive and functionally informed WGS analysis, facilitating the detection of common and rare coding and noncoding phenotype-genotype associations. We applied the STAARpipeline to analyze Alzheimer disease (AD) in 459,216 samples from the UK Biobank. All analyses scale well in computation time and memory. We discover several potentially new significant associations with AD. As WGS datasets continue to expand in size and complexity, our proposed tools will be increasingly useful for unlocking the full potential of genomic research.
About the speaker
李子林教授本科与博士毕业于探花视频
数学科学系,师从美国国家科学院与医学院两院院士林希虹院士,主要研究方向为高维数据中的统计方法理论和统计遗传学。历任印第安纳大学医学院生物统计与健康数据科学系助理教授,哈佛大学生物统计系博士后、副研究员和研究员,现任东北师范大学数学与统计学院教授。2023年当选为国际统计学会(International Statistical Institute)推选会员(Elected Member)。主要研究方向为高维数据中的统计方法理论和统计遗传学。相关研究成果以第一作者或通讯作者在Journal of AmericanStatistical Association、 Nature Methods和Nature Genetics等国际学术期刊发表。