Profile Picture
"Be fortunate, kind, and grateful"

Ruifeng Hu (Ph.D.)

I am a Computational Biologist and an AI/ML technology enthusiast.
Let's connect and explore new possibilities in AI-driven bioinformatics!
🔬🤖✨

More About Me

I am an experienced Computational Biologist with a passion for AI/ML technologies.

Hello there!

I'm Ruifeng Hu, Ph.D., a computational biologist specializing in bioinformatics, multi-omics data analysis and machine learning. Currently, I work as a Research Scientist at Yale School of Medicine, applying advanced statistical and AI-driven approaches to decipher complex biological data for target discovery, precision medicine, and disease modeling.

I actively contribute to open-source projects and have developed various bioinformatics tools and databases. My research focuses on leveraging AI and data-driven methods to advance biomedical discoveries.

I thrive in collaborative environments, working closely with wet lab scientists and other researchers to bridge the gap between computational and experimental biology. Always exploring the cutting-edge AI applications in bioinformatics!

I have rich experience in:

  • Multi-omics Data Analysis: NGS, Single-cell & bulk RNA-seq, spatial transcriptomics, CRISPR screens, WGS, Genomics/Genetics.
  • Machine Learning & Deep Learning: Regression, clustering, generative models (VAE, LSTM, Transformer).
  • Pipeline, Cloud & HPC Computing: Nextflow, Airflow, Git, Docker, AWS, GCP, high-performance computing.
  • Bioinformatics Pipelines & Web Development: Automated workflows and interactive genomic data platforms.

I've got some skills. [Non-exhaustive]

Bioinfomatics skills

Multi-omics data analysis bulk & scRNAseq, scATACseq Spatial transcriptomics/Visium/MERFISH WGS WG CRISPR screen data Functional (Epi)genomics Computational Genomics ML/DL Statistical modeling Analysis pipelines Programming expertise Bioinformatics tool & database development Web & API development Data visualization & dashboarding Biological insights interpretation and presentation

Programming languages

Python R C C++ Java Go PHP JavaScript HTML/CSS Shell Scripts

Framework & DevOps

React Express FastAPI Flask Django MySQL PostgreSQL MongoDB GitHub Docker

Data Ops & ML Ops

Nextflow Airflow AWS/GCP PyTorch Keras Tensorflow scikit-learn Pandas / NumPy / SciPy Tidyverse / dplyr / tidyr / ggplot2 CNN / RNN / VAE / Transformer / LSTM / Attention models

Others

Cross-functional communication and collaboration Mentoring Project leadership

Some of My Projects

ImmuneApp

an AI-driven tool for HLA-I epitope prediction and interpretable clinical immunopeptidome analysis.

#Python
#Immunopeptidome
#HLA-I epitope
#BiLSTM
#Attention layer
#HTML/JS/CSS
Visit → Read Publication →

VisiumST Images Crop

VisiumST Dot Frame Detection and Cropping

#Python
#Visium Spatial Transcriptomics
#Image processing
#Data visualization
Visit → Single cell / Spatial transcriptomics data processing →

BrainDataPortal

The brain omics data analysis and visualization platform:scRNAseq, scATACseq, ChIPseq, Spatial Transcriptomics, WGS and other omics data.

#React
#Vite
#FastAPI
#PostgreSQL
View Codes →

Dr.VAEN:

Drug response prediction using Variational Autoencoder based Elastic Net models.

#Python
#Drug response
#VAE:Variational Autoencoder
#ElasticNet regression
#HTML/JS/CSS
Visit → Read Publication →

MitoX

exploring mitochondrial heteroplasmy and gene expression from single-cell sequencing assays.

#Single cell
#Mitochondrial
#Python
View Codes →

scRNA_Nextflow

Nextflow pipeline for single-cell RNAseq data analysis

#Nextflow
#Single cell
#scRNAseq
View Codes →

PDTrans

Predict future UPDRS III trends in Parkinson's Disease using transformer.

#Python
#Transfromer
#Parkinson's Disease
#AI
View Codes →

CSDAV

CRISPR Screen Data Analysis and Visualization.

#R
#R Shiny
#WG CRISPR Screen
View Codes →

My Work Experience

October 2024 - Present

Yale School of Medicine, New Haven, CT

Research Scientist, Research rank faculty

  • In summary: working as an advanced member of a research group. I focus on several projects in supporting of the PI, my responsibilities involve the combination of project management, research execution, data and pipeline management, research staff/trainee training and supervision, leading or assisting in the development of papers for publication and conference presentations, and supporting the preparation of grant applications and reports to funding agencies.
  • Developing automated, efficient internal data analysis pipelines for standard NGS datasets, including genomics, transcriptomics, epigenomics, proteomics, etc. at bulk, single-cell or spatial level. (Include setting up cross-cohort data Q/C, data harmonization, and result reporting, maintaining the codes via GitHub/Docker with well-documented README as well as developing or adapting novel algorithms, statistical modeling, and workflow to support customized data analysis. Nextflow and Airflow were mainly used.)
  • Creating exploratory data portal platform for data deposit, data management and building intuitive visualization and querying of relationships across data modalities; Providing strategic solution to accelerate the store/read ST and SC data to/from database.
  • Communicating with other parties in projects (incl. wet lab, sequencing core, study coordinator, other data scientist, and PIs etc.). Creating the standard / SOP of data management, data delivery, QC merit and reporting.
  • Other research activities: Assisting the PI to provide basic and advanced bioinformatics training to the junior members in the team; Establishing the lab GitHub with curated core compute pipelines and maintaining the bioinformatics resources; Managing the lab server or HPC storage space and developing strategic plan to fulfill the lab computational needs. Drafting of the research plan, writing of research SOPs, application and renewal of grants. Staying current with advancements in the field to incorporate emerging technologies.
June 2023 - June 2024

Bristol Myers Squibb, Cambridge, MA

Principal Scientist, Computational Reverse Translation

  • Lead the project of utilizing/customizing/training genomics foundation models (e.g., Geneformer, scGPT) to decipher the genomics data, and map patient samples to reductionist models.
  • Analyze multi-omics (e.g., bulk/scRNAseq, Spatial transcriptomics) data for potential drug targets/biomarkers identification from PDAC, NSCLC patient and model samples.
  • Involved in the development of the spatial data portal framework which can help biologists to browse and visualize analyzed results.
  • Whole-genome CRISPR data analysis to find the vulnerability genes of cancer cell lines under conditions.
  • Collaboration with the MoCR TRC (Mechanisms of Cancer Resistance Thematic Research Center) teams on lung cancer R&D.
March 2021 - June 2023

Harvard Medical School - BWH, Boston, MA

Senior Research Associate

  • Lead the project of discovering and replicating the differentially expressed genes from RNAseq data analysis to find potential diagnosis biomarkers in PPMI and PDBP cohorts.
  • Build classification models for prediction of the Parkinson’s Disease status utilizing the multi-omics data (transcriptomics, genetics, and clinical data)
  • Investigation of advanced machine learning/deep learning models (Autoencoder, DNN, LSTM, Transformer) for Parkinson’s Disease status prediction and UPDRS regression.
  • Analysis of RNAseq data obtained from a time-series experimental design to find the potential diagnosis biomarkers for PD patients at different development stages.
  • Build scRNAseq data analysis pipeline using Nextflow, and analyze the scRNAseq data for PD organoid samples
  • Others: Running jobs on HPC with LSF, Slurm; Internal Linux server management.
October 2017 - March 2021

School of Biomedical Informatics, UT Health Science Center, Houston, TX

Postdoctoral Research Fellow

  • Develop, implement, and maintain next-generation sequencing (NGS) data processing pipelines or modules to perform standard data analysis in an automated fashion for researchers and clinicians to explore the results. (TCGA cancer RNAseq/mutation data analysis, survival analysis, et.al)
  • Development of novel computational methods/solutions for biomedical problems utilizing machine learning and deep learning models. (Drug response prediction, HLA-antigen prediction, Mutation effects, et.al)
  • Constructions of biomedical knowledgebases and web servers for research communities (Mutation-drug response database, Cell-type enrichment analysis web server, et.al).
  • Implement sequence-based models for exploring biological insights (Finding sequence motifs, et.al)
  • Mentor or train junior investigators in the Center for Precision Health.
  • Others: collaborate with wet labs and clinicians to provide technical expertise to lead in the development of study design, sample or data collection, pipeline development, data analysis, results interpretation, manuscript writing, and grant proposal preparation.

My Education

2012 - 2017

Ph.D. Bioinformatics & Computational Biology

Peking Union Medical College & Chinese Academy of Medical Sciences, Tsinghua University

  • National Scholarship for Graduate Students, from Chinese Ministry of Education and Ministry of Finance (CAMS&PUMC) (the highest scholarship for graduate students) - 2016
  • First Prize Scholarship for Academic Excellence (CAMS&PUMC) - 2014, 2015
  • Outstanding Graduate Students of PUMC - 2014, 2015,2016
2008 - 2012

B.Eng. Computer Science and Technology

College of Computer Science and Technology, Nanjing Forestry University

  • National Scholarship for Undergraduate Students, from Chinese Ministry of Education and Ministry of Finance (NJFU) (the highest scholarship for undergraduate students) - 2009
  • The First Prize Scholarship of Nanjing Forestry University - 2010, 2011
  • Outstanding Undergraduates of NJFU - 2012
  • Merit Student of Nanjing Forestry University, China (For three consecutive years) - 2009, 2010, 2011