Chengshuo Dai

Walking through the mud, Gazing at the stars.

"脚踏泥泞，仰望星空。"

"Who doesn't yearn to ride the crest of the coming tide?"

About Me

I’m a self-taught AI/LLM engineer in progress, currently transitioning into the field by studying GenAI and building real-world LLM apps.

My academic background spans Information Mgt, Finance, and Biostat, but growing fascination with LLM led me to pivot toward AI engineering. I began this journey in 2025, and since then I’ve been focused on studying how modern large language model systems work.

Through self-study and hands-on projects, I’ve been exploring the end-to-end LLM stack (including the engineer intuition of math/stat behind) — from model fundamentals (architectures, pretraining, fine-tuning, RLHF) to inference optimization and downstream applications such as RAG pipelines, agentic workflows, and AI-powered search systems.

I believe in the AI era, the most valuable asset is not a single background, but the curiosity, discipline, and ability to fast-learn and adapt. As a self-driven builder, I’m deeply curious about how new AI technologies work, enjoy exploring them from first principles and turn them into working systems and practical products.

Here’s a quote that deeply inspires me:
Background defines the past, but problems define the future.

In the age of software3.0, there are no fixed background — only problem waiting to be solved.

Education

Yale University

2025.08 - 2027.05

Biostatistics Data Science

Advanced coursework in statistical methods, machine learning, data analysis, and computational biology. Research focus on NLP and knowledge graphs in genomics.

Gerstein Lab ResearchNLP & Genomics

Capital University of Economics and Business

2021.09 - 2025.06

Information Management and Information Systems (Finance.)

Comprehensive coursework in Java, Python, Database Systems, Web Design, and System Design. Strong foundation in information systems, data management, and software development.

Academic Excellence Scholarship (2022-2023, 2023-2024)Merit Student (2022-2023, 2023-2024)

Experience

ETH | Machine Learning Engineer Intern (Nework, CA)

2025.09 - 2025.12

Agentic Multimodal Search System

Built an Agentic Multimodal Search System POC to improve search coverage and relevance, supporting hybrid retrieval and temporal queries. Developed a multimedia indexing pipeline and an LLM query understanding module for NL-to-DSL parsing, fine-tuning domain-specific LLMs to optimize retrieval and server-side performance.

Yale Gerstein Lab | Research Assistant

2025.09 - 2025.11

BioGraphRAG Biomedical Retrieval System

Built a GraphRAG-style biomedical mechanism retrieval system with DAG reasoning constraints to reduce semantic drift. Designed a comprehensive retrieval pipeline integrating Neo4j multi-hop search, semantic index construction, and FAISS reranking to generate traceable mechanism explanations.

Skills

Languages

Python, R, PyTorch, TypeScript, SQL, Java

Frameworks/Tools

LangChain, LLaMA Factory, Elasticsearch, Docker, FastAPI, Linux, Git, AWS

Projects

Multi-Session Academic Research Agent

2025.10 - 2025.11

A context-aware QA system with conversation memory management built on LangChain and DeepSeek. Features a scalable microservices architecture with FastAPI, SQLAlchemy ORM, and Docker containerization, supporting streaming responses and multi-user sessions.

LangChainDeepSeekFastAPITypeScriptSQLAlchemyDocker

RAG-based PDF Knowledge Chatbot

2025.10 - 2025.11

An end-to-end RAG-based document QA system enabling natural language interaction with unstructured PDFs. Implements a complete LLM pipeline including document parsing, embedding generation, and FAISS-based semantic retrieval using Python, OpenAI API, and Streamlit.

PythonLangChainOpenAI APIStreamlitFAISSRAG

DCS Blog

Motto

Always do the meaningful things.

Chengshuo Dai

Walking through the mud, Gazing at the stars.

About Me

Education

Yale University

Capital University of Economics and Business

Experience

Master's / US Stage

ETH | Machine Learning Engineer Intern (Nework, CA)

Yale Gerstein Lab | Research Assistant

Bachelor's / China Stage

Skills

Languages

Frameworks/Tools

Projects

Multi-Session Academic Research Agent

RAG-based PDF Knowledge Chatbot

My Blog

A little bit personal thought about Vibe Coding

Understanding Agentic RAG Systems

GraphRAG in Biomedical Applications

Send Me a Message

Motto