报告题目:Toward Gene Language Model via Multi-omics Data
报告时间:2024-12-04 10:00-11:00
报 告 人:孙晓波 副教授(中南财经政法大学)
报告地点:雷军科技楼八楼报告厅(806)
Abstract:The shift from hypothesis-driven biomedical research to data-driven, AI-led approaches is a hallmark of the current era of rapid advances in AI technologies and growing availability of multi-omics data that characterize cellular states at various molecular levels. Similar to how natural languages describe the physical world, genes can be seen as the "words" of life's genetic language. Inspired by the success of language models in advancing Artificial General Intelligence (AGI), there is great potential for developing gene language models to enhance AGI-powered biomedical research. However, gene language model is in its early stages, primarily due to its under-developed foundations, particularly in learning distributed gene representations. Challenges in gene representation learning include effectively capturing gene functional and relational semantics, integrating multimodal data, incorporating spatial gene information, addressing gene polysemy, and applying gene representations to downstream tasks. To address these challenges, we have develop novel computational methods to identify biologically significant genomic “contexts” from scRNA-seq and spatial transcriptomics (ST). These methods enable the learning of contextualized, spatially informed gene representations enriched with functional and relational semantics. We validate the biological relevance of the identified genomic contexts and gene representations using datasets from both health and disease contexts. Furthermore, we demonstrate the versatility of these gene representations in downstream applications such as identifying disease- or trait-associated genes, inferring gene-gene interactions and regulatory networks, imputing missing genes, and discovering gene expression patterns.