Data-Juicer Agents: Towards Agentic Data Processing#

A Suite of Agents for Agentic Data Processing. Built on Data-Juicer (DJ) and AgentScope.

įŽ€äŊ“中文 | English

đŸ—ī¸ Overview Doc â€ĸ âšĄī¸ Quick Start Doc â€ĸ >_ CLI Doc â€ĸ 🔧 Tools Doc â€ĸ đŸŽ¯ Roadmap

News#

  • 🚀 [2026-03-11] Major refactor and upgrade of data_juicer_agents completed.

    • The project architecture and CLI/session capabilities were comprehensively redesigned for better maintainability and extensibility.

    • đŸ—ī¸ Overview | âšĄī¸ Quick Start | >_ CLI Doc | 🔧 Tools | đŸŽ¯ Roadmap

    • Try processing data by chatting with the agent!

  • 🚀[2026-01-15] Q&A Copilot has been deployed on the official Doc Site | DingTalk | Discord of Data-Juicer. Feel free to ask Juicer anything related to the Data-Juicer ecosystem!

Roadmap#

The long-term vision of DJ-Agents is to enable a development-free data processing lifecycle, allowing developers to focus on what to do rather than how to do it.

To achieve this vision, we are tackling two fundamental challenges:

  • Agents: How to design and build powerful agents specialized in data processing

  • Services & Tools: How to package these agents into ready-to-use, out-of-the-box products

We continuously iterate on both directions, and the roadmap may evolve accordingly as our understanding and capabilities improve.


Agents#

  • Data-Juicer Data Processing Agent (DJ Process Agent) & Data-Juicer Code Development Agent (DJ Dev Agent)

  • We have stopped building scenario-specific data processing agents, and instead are building data processing tools for general-purpose agents. From there:

    • Hard-orchestrate these tools into capabilities, exposed as the djx CLI

    • Soft-orchestrate them through prompts, packaged as skills

    • Rely on agent self-orchestration to support conversational data processing

Services & Tools#

  • Q&A Copilot: a Q&A assistant for the Data-Juicer ecosystem

  • InteRecipe: interactive data recipe construction through natural language

    • [2026-03-11]: the current ./interactive_recipe only shows workflow-based examples. The dj-agents CLI entry is already built and supports interactive data-recipe construction through natural language in the TUI. We are developing a frontend tool (studio) on top of this foundation as the next upgrade.


Priority Items#

  • DJ Skills: use prompt-based soft orchestration to package tools into skills for general-purpose agents.

  • InteRecipe Studio: support interactive data recipe construction through natural language, with multi-dimensional data and result views.

  • Plan Tool: extend support for fuller Data-Juicer capability coverage, DJ Hub recipe matching, and more.

  • Dev Tool: stabilization testing and optimization

Long-term Directions#

  • Continue building tools and skills for broader data-processing scenarios, enabling wider and more flexible applications.

    • RAG

    • Embodied Intelligence

    • Data Lakehouse architectures

Common Issues#

Q: How to get DashScope API key? A: Visit DashScope official website to register an account and apply for an API key.