TODO:
- generate trajectory details tools?
Blog Release Xueguang
Zhuofeng Li*$^{1}$, Dongfu Jiang*$^{2}$, Xueguang Ma*$^{2}$, Haoxiang Zhang$^{3}$, Yuyu Zhang$^{4}$, Kai Zou$^{5}$, Ping Nie$^{2}$, Jianwen Xie$^{6}$, Yu Zhang†$^{1}$, Wenhu Chen†$^{2}$
$^{1}$Texas A&M University $^{2}$Waterloo University $^{3}$UC San Diego $^{4}$Verdent AI $^{5}$NetMind AI $^{6}$Lambda
*****: Equal Contribution; **†**Corresponding Authors
February 2026
<aside>
🌟
TL;DR
- Using GPT-OSS-120B + offline corpus and retriever, you can synthesize long-horizon deep research trajectories with 100+ turns with no additional cost for search and scraper APIs at training time.
- SFT from deep research trajectory significantly boosts performance: training Nemotron-3-Nano-30B-A3B-Base on our synthesized trajectories improves BrowseComp-Plus accuracy to 54.8%.
- Offline corpus eliminates API dependency: API-based synthesis is expensive ($0.001-0.01/query), rate-limited, and non-deterministic. Our offline setup is free, unlimited, and fully reproducible.
- Process over outcome: even trajectories without correct final answers teach models how to search effectively. TODO: delete here
- We release everything: search environment, deep research trajectories, trained models, and code.
👨💻 Github, 🤗 HF Model, 🤗 HF Dataset, 📈 Wandb Logs, 🔎 Eval Logs
</aside>

1. Open Source Gaps in Deep Research Agents
Deep research agents—systems that perform iterative search, evidence gathering, and multi-step reasoning—have become a key frontier for LLM capabilities. Commercial systems like Perplexity Deep Research, OpenAI Deep Research, and Gemini Deep Research demonstrate impressive performance, but open-source alternatives lack key components.
| Work |
Weights |
Trajectories |
Code |
Environment |
| Search-R1 [1] |
✅ |
❌ |
✅ |
✅ (Wikipedia) |
| WebShaper [2] |
❌ |
❌ |
✅ |
❌ (API-based) |
| MiroMind [3] |
✅ |
✅ |
✅ |
❌ (API-based) |
| Ours |
✅ |
✅ |
✅ |
✅ |
Key gaps in existing work:
- No long-horizon trajectories: Search-R1 uses Wikipedia with only 2-5 turns—far from real deep research
- No offline environment: Most rely on live search APIs, making reproduction expensive and non-deterministic
- Incomplete releases: No single work releases weights + trajectories + code + environment together
This post provides a fully open pipeline for synthesizing long-horizon deep research trajectories with 100+ turns, releasing all components for the community.