Chengshuo Dai
chengshuo.dai23@gmail.comWalking through the mud, Gazing at the stars.
"脚踏泥泞,仰望星空。"
"Who doesn't yearn to ride the crest of the coming tide?"
"脚踏泥泞,仰望星空。"
"Who doesn't yearn to ride the crest of the coming tide?"
I’m a self-taught AI/LLM engineer in progress, currently transitioning into the field by studying GenAI and building real-world LLM apps.
My academic background spans Information Mgt, Finance, and Biostat, but growing fascination with LLM led me to pivot toward AI engineering. I began this journey in 2025, and since then I’ve been focused on studying how modern large language model systems work.
Through self-study and hands-on projects, I’ve been exploring the end-to-end LLM stack (including the engineer intuition of math/stat behind) — from model fundamentals (architectures, pretraining, fine-tuning, RLHF) to inference optimization and downstream applications such as RAG pipelines, agentic workflows, and AI-powered search systems.
I believe in the AI era, the most valuable asset is not a single background, but the curiosity, discipline, and ability to fast-learn and adapt. As a self-driven builder, I’m deeply curious about how new AI technologies work, enjoy exploring them from first principles and turn them into working systems and practical products.
Here’s a quote that deeply inspires me:
Background defines the past, but problems define the future.

In the age of software3.0, there are no fixed background — only problem waiting to be solved.
"Biostatistics Data Science
Advanced coursework in statistical methods, machine learning, data analysis, and computational biology. Research focus on NLP and knowledge graphs in genomics.
Information Management and Information Systems (Finance.)
Comprehensive coursework in Java, Python, Database Systems, Web Design, and System Design. Strong foundation in information systems, data management, and software development.
Agentic Multimodal Search System
Built an Agentic Multimodal Search System POC to improve search coverage and relevance, supporting hybrid retrieval and temporal queries. Developed a multimedia indexing pipeline and an LLM query understanding module for NL-to-DSL parsing, fine-tuning domain-specific LLMs to optimize retrieval and server-side performance.
BioGraphRAG Biomedical Retrieval System
Built a GraphRAG-style biomedical mechanism retrieval system with DAG reasoning constraints to reduce semantic drift. Designed a comprehensive retrieval pipeline integrating Neo4j multi-hop search, semantic index construction, and FAISS reranking to generate traceable mechanism explanations.
Python, R, PyTorch, TypeScript, SQL, Java
LangChain, LLaMA Factory, Elasticsearch, Docker, FastAPI, Linux, Git, AWS
A context-aware QA system with conversation memory management built on LangChain and DeepSeek. Features a scalable microservices architecture with FastAPI, SQLAlchemy ORM, and Docker containerization, supporting streaming responses and multi-user sessions.
An end-to-end RAG-based document QA system enabling natural language interaction with unstructured PDFs. Implements a complete LLM pipeline including document parsing, embedding generation, and FAISS-based semantic retrieval using Python, OpenAI API, and Streamlit.
People hear "vibe coding" and assume it means: I don't know how to code, so I just ask an LLM to do it.
A deep dive into how agentic workflows are transforming traditional Retrieval-Augmented Generation, making systems more robust and context-aware.
Exploring the intersection of knowledge graphs and LLMs for biomedical mechanism retrieval, and how DAG constraints reduce semantic drift.
Always do the meaningful things.