I am an undergraduate student at Shanghai Jiao Tong University majoring in Artificial Intelligence, where I enrolled in Fall 2023. My research interests center on Agent, Multimodal, and Large Language Models. I am currently focusing on Agent and Multimodal research. I have gained practical experience through coursework and research projects. I am eager to further explore these areas and welcome opportunities to exchange ideas and collaborate with peers who share similar interests.

🔬 My Research

  • Research Interests: Agent, Multimodal, Large Language Model, AI for games
  • Current Focus: Agent & Multimodal

🎖 Honors and Awards

  • 2024, 2025 Zhiyuan Honors Scholarship
    Award for top students majoring in science.
  • 2024, 2025 Undergraduate Excellence Scholarship
    Awarded to students with outstanding comprehensive evaluation rankings.

📖 Educations

2023.09 – now
BEng Artificial Intelligence · Zhiyuan Honors Program

💻 Internships

2024.07 – 2024.09
Summer Research Internship
SJTU – Artificial Intelligence Institute, DeepVision Lab
  • Explored Computer Vision fundamentals.
  • Learned to read research papers and reproduced basic CV algorithms.
2025.03 – 2025.12
Research Internship
SJTU – Artificial Intelligence Institute, DeepVision Lab
  • Conducting research on MultiModal Large Language Model.
  • Participated in training a GUI recognition model for future GUI Agent (collaborative project with Huawei). Responsible for data synthesis, annotation, and cleaning.
  • Conducting research on visual token pruning for multimodal models.
2025.12 – Present
Research Internship
SJTU & SII – GAIR Lab
  • First author & Project leader of AcademiClaw, a benchmark for OpenClaw.
  • Participant in davinci-magihuman.
  • Built downstream applications based on davinci-magihuman, including digital human meetings and digital human live streaming. [Soon Public]
  • Participated in the development of WeSay (a speech recognition app). [Private repo]

📝 Publications

AcademiClaw: When Students Set Challenges for AI Agents

Junjie Yu, Pengrui Lu, Weiye Si, et al. — arXiv preprint, 2026.
  • The first academic-level benchmark for OpenClaw sourced directly from undergraduate students.
  • 80 complex, long-horizon tasks across 25+ domains; best frontier model achieves only ~55% pass rate.
  • Contributes to evaluating academic capabilities and advancing the OpenClaw community.

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

arXiv preprint, 2026.
  • First open-source model to achieve joint audio-video generation with a pure single-stream Transformer.
  • 80% win rate over Ovi 1.1 and 60.9% over LTX 2.3 in human evaluation, reaching open-source SOTA.
  • Generates a 5-second clip in ~2s on a single H100; supports 6+ languages.

📂 Projects

AcademiClaw

The first academic-level benchmark for OpenClaw sourced directly from undergraduate students.

Python Public

Paper-RAG

An intent-aware paper recommendation and research assistant powered by RAG.

Python Public

GUI-Project

Data preparation for training a GUI recognition model for future GUI Agent.

Python Public

ViT-on-Image-Classification

ViT on image classification, esp. small-scale datasets (CIFAR-10).

Jupyter Notebook Public