Data-Juicer Agents: Towards Agentic Data Processing#

A Suite of Agents for Agentic Data Processing. Built on Data-Juicer (DJ) and AgentScope.

简体中文 | English

OverviewQuick StartDocumentation

News#

🚀[2026-01-15] Q&A Copilot Juicer has been deployed on the official documentation site of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem. Check 📃 Deploy-ready codes | 🎬 More demos | 🎯 Dev Roadmap.

Overview#

This repo maintains a suite of agents that enable users to interact with Data-Juicer’s powerful data processing capabilities through natural language.

  • In Data-Juicer ecosystem, Data-Juicer Agents (DJ-Agents) play a key role in the interface layer, bridging users with the powerful Data-Juicer infrastructure and toolkit for building data-centric applications.

  • Unlike traditional API- or CLI-based interaction, DJ-Agents leverage agent-based interaction, tool use, and extensibility to enable non-expert users to access Data-Juicer’s data-processing capabilities through intuitive natural-language interactions.

  • The long-term goal of DJ-Agents is to enable a development-free data processing lifecycle, allowing developers to focus on what to do rather than how to do it.

The Data-Juicer Agents family currently contains the following members:

  • Data-Juicer Q&A Agent (DJ Q&A Agent)

  • Data-Juicer Data Processing Agent (DJ Process Agent) [Beta version]

  • Data-Juicer Code Development Agent (DJ Dev Agent) [Beta version]

Data-Juicer Agents adopts a multi-agent routing architecture for routing requests to the corresponding agent. Check agent info for more details.

Quick Start#

Online Services#

  • Q&A Copilot Juicer has been deployed on the official doc page of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem.

More online agentic services are being planned and developed—check out our Roadmap and join us!

Local Deployment#

Follow the document to locally launch DJ-Agents.

If you encounter any issues, check common issues or ask our Q&A copilot Juicer at the doc page.

Roadmap#

The long-term vision of DJ-Agents is to enable a development-free data processing lifecycle, allowing developers to focus on what to do rather than how to do it.

To achieve this vision, we are tackling two fundamental challenges:

  • Agent Level: How to design and build powerful agents specialized in data processing

  • Service Level: How to package these agents into ready-to-use, out-of-the-box products

We continuously iterate on both directions, and the roadmap may evolve accordingly as our understanding and capabilities improve.

Below is the current development checklist.


Agents#

  • Data-Juicer Q&A Agent (DJ Q&A Agent)
    Answers Data-Juicer–related questions from both existing and potential users.

    • Implemented

    • [2026-01-15]: The current DJ Q&A Agent demonstrates strong performance in our internal evaluations and is considered production-ready.

  • Data-Juicer Data Processing Agent (DJ Process Agent)
    Automatically invokes Data-Juicer tools to fulfill data processing requests.

    • In progress

    • [2026-01-15]: The current DJ Process Agent is in beta. We are actively benchmarking and optimizing its capabilities.

  • Data-Juicer Code Development Agent (DJ Dev Agent)
    Automatically develops new data processing operators based on user requirements.

    • In progress

    • [2026-01-15]: The current DJ Dev Agent is in beta. Capability evaluation and optimization are ongoing.


Services#

  • Q&A Copilot — Juicer

    • Overall service

    • [2026-01-15]: Juicer is currently available on the documentation site. We are working on deployments for community platforms.

      • Documentation Website

      • DingTalk Group

      • Discord Server

  • Interactive Data Analysis Studio (In Development)

    • [2026-01-15]: A demo is available. The current version primarily relies on predefined workflows. We are working on integrating agent-based intelligence.

  • MCP Service

    • Planned


Future Directions#

  • Workflows as Skills
    Data-Juicer Hub hosts a growing collection of data processing recipes and workflows contributed by the Data-Juicer community.

    As data processing demands expand into new scenarios—such as RAG, Embodied Intelligence, and Data Lakehouse architectures—we plan to incorporate existing and newly developed workflows into DJ-Agents as reusable skills, enabling broader and more flexible data processing applications.

Common Issues#

Q: How to get DashScope API key? A: Visit DashScope official website to register an account and apply for an API key.

Q: Why does operator retrieval fail? A: Please check network connection and API key configuration, or try switching to vector retrieval mode.

Q: How to debug custom operators? A: Ensure Data-Juicer path is configured correctly and check the example code provided by the code development agent.

Q: What to do if MCP service connection fails? A: Check if the MCP server is running and confirm the URL address in the configuration file is correct.

Q: Error: requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:3000/trpc/pushMessage A: Agents handle data via file references (paths) rather than direct uploads. Please confirm whether any non-text files were submitted.

Optimization Recommendations#

  • For large-scale data processing, it is recommended to use Data-Juicer’s distributed mode

  • Set batch size appropriately to balance memory usage and processing speed

  • For more advanced data processing features (synthesis, Data-Model Co-Development), please refer to Data-Juicer documentation