I am an undergraduate student at Shanghai Jiao Tong University majoring in Artificial Intelligence, where I enrolled in Fall 2023. My research interests center on Agent, Multimodal, and Large Language Models. I am currently focusing on Agent and Multimodal research. I have gained practical experience through coursework and research projects. I am eager to further explore these areas and welcome opportunities to exchange ideas and collaborate with peers who share similar interests.

🔬 My Research

Research Interests: Agent, Multimodal, Large Language Model, AI for games
Current Focus: Agent & Multimodal

🎖 Honors and Awards

2024, 2025 Zhiyuan Honors Scholarship
Award for top students majoring in science.
2024, 2025 Undergraduate Excellence Scholarship
Awarded to students with outstanding comprehensive evaluation rankings.

📖 Educations

2023.09 – now

Shanghai Jiao Tong University

BEng Artificial Intelligence · Zhiyuan Honors Program

💻 Internships

2024.07 – 2024.09

Summer Research Internship

SJTU – Artificial Intelligence Institute, DeepVision Lab

Explored Computer Vision fundamentals.
Learned to read research papers and reproduced basic CV algorithms.

2025.03 – 2025.12

Research Internship

SJTU – Artificial Intelligence Institute, DeepVision Lab

Conducting research on MultiModal Large Language Model.
Participated in training a GUI recognition model for future GUI Agent (collaborative project with Huawei). Responsible for data synthesis, annotation, and cleaning.
Conducting research on visual token pruning for multimodal models.

2025.12 – Present

Research Internship

SJTU & SII – GAIR Lab

First author & Project leader of AcademiClaw, a benchmark for OpenClaw.
Participant in davinci-magihuman.
Built downstream applications based on davinci-magihuman, including digital human meetings and digital human live streaming. [Soon Public]
Participated in the development of WeSay (a speech recognition app). [Private repo]

📝 Publications

AcademiClaw: When Students Set Challenges for AI Agents

Junjie Yu, Pengrui Lu, Weiye Si, et al. — arXiv preprint, 2026.

The first academic-level benchmark for OpenClaw sourced directly from undergraduate students.
80 complex, long-horizon tasks across 25+ domains; best frontier model achieves only ~55% pass rate.
Contributes to evaluating academic capabilities and advancing the OpenClaw community.

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

arXiv preprint, 2026.

First open-source model to achieve joint audio-video generation with a pure single-stream Transformer.
80% win rate over Ovi 1.1 and 60.9% over LTX 2.3 in human evaluation, reaching open-source SOTA.
Generates a 5-second clip in ~2s on a single H100; supports 6+ languages.

📂 Projects

AcademiClaw

The first academic-level benchmark for OpenClaw sourced directly from undergraduate students.

Python Public

Paper-RAG

An intent-aware paper recommendation and research assistant powered by RAG.

Python Public

GUI-Project

Data preparation for training a GUI recognition model for future GUI Agent.

Python Public

ViT-on-Image-Classification

ViT on image classification, esp. small-scale datasets (CIFAR-10).

Jupyter Notebook Public

More Projects →