Data-Juicer Agents: Towards Agentic Data Processing#
A Suite of Agents for Agentic Data Processing. Built on Data-Juicer (DJ) and AgentScope.
简体中文 | English
News#
🚀[2026-01-15] Q&A Copilot Juicer has been deployed on the official documentation site of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem. Check 📃 Deploy-ready codes | 🎬 More demos | 🎯 Dev Roadmap.
Overview#
This repo maintains a suite of agents that enable users to interact with Data-Juicer’s powerful data processing capabilities through natural language.
In Data-Juicer ecosystem, Data-Juicer Agents (DJ-Agents) play a key role in the interface layer, bridging users with the powerful Data-Juicer infrastructure and toolkit for building data-centric applications.
Unlike traditional API- or CLI-based interaction, DJ-Agents leverage agent-based interaction, tool use, and extensibility to enable non-expert users to access Data-Juicer’s data-processing capabilities through intuitive natural-language interactions.
The long-term goal of DJ-Agents is to enable a development-free data processing lifecycle, allowing developers to focus on what to do rather than how to do it.
The Data-Juicer Agents family currently contains the following members:
Data-Juicer Q&A Agent (DJ Q&A Agent)
Data-Juicer Data Processing Agent (DJ Process Agent) [Beta version]
Data-Juicer Code Development Agent (DJ Dev Agent) [Beta version]
Data-Juicer Agents adopts a multi-agent routing architecture for routing requests to the corresponding agent. Check agent info for more details.
Quick Start#
Online Services#
Q&A Copilot Juicer has been deployed on the official doc page of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem.
More online agentic services are being planned and developed—check out our Roadmap and join us!
Local Deployment#
Follow the document to locally launch DJ-Agents.
If you encounter any issues, check common issues or ask our Q&A copilot Juicer at the doc page.
Roadmap#
The long-term vision of DJ-Agents is to enable a development-free data processing lifecycle, allowing developers to focus on what to do rather than how to do it.
To achieve this vision, we are tackling two fundamental challenges:
Agent Level: How to design and build powerful agents specialized in data processing
Service Level: How to package these agents into ready-to-use, out-of-the-box products
We continuously iterate on both directions, and the roadmap may evolve accordingly as our understanding and capabilities improve.
Below is the current development checklist.
Agents#
Data-Juicer Q&A Agent (DJ Q&A Agent)
Answers Data-Juicer–related questions from both existing and potential users.Implemented
[2026-01-15]: The current DJ Q&A Agent demonstrates strong performance in our internal evaluations and is considered production-ready.
Data-Juicer Data Processing Agent (DJ Process Agent)
Automatically invokes Data-Juicer tools to fulfill data processing requests.In progress
[2026-01-15]: The current DJ Process Agent is in beta. We are actively benchmarking and optimizing its capabilities.
Data-Juicer Code Development Agent (DJ Dev Agent)
Automatically develops new data processing operators based on user requirements.In progress
[2026-01-15]: The current DJ Dev Agent is in beta. Capability evaluation and optimization are ongoing.
Services#
Q&A Copilot — Juicer
Overall service
[2026-01-15]: Juicer is currently available on the documentation site. We are working on deployments for community platforms.
Documentation Website
DingTalk Group
Discord Server
Interactive Data Analysis Studio (In Development)
[2026-01-15]: A demo is available. The current version primarily relies on predefined workflows. We are working on integrating agent-based intelligence.
MCP Service
Planned
Future Directions#
Workflows as Skills
Data-Juicer Hub hosts a growing collection of data processing recipes and workflows contributed by the Data-Juicer community.As data processing demands expand into new scenarios—such as RAG, Embodied Intelligence, and Data Lakehouse architectures—we plan to incorporate existing and newly developed workflows into DJ-Agents as reusable skills, enabling broader and more flexible data processing applications.
Common Issues#
Q: How to get DashScope API key? A: Visit DashScope official website to register an account and apply for an API key.
Q: Why does operator retrieval fail? A: Please check network connection and API key configuration, or try switching to vector retrieval mode.
Q: How to debug custom operators? A: Ensure Data-Juicer path is configured correctly and check the example code provided by the code development agent.
Q: What to do if MCP service connection fails? A: Check if the MCP server is running and confirm the URL address in the configuration file is correct.
Q: Error: requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://localhost:3000/trpc/pushMessage A: Agents handle data via file references (paths) rather than direct uploads. Please confirm whether any non-text files were submitted.
Optimization Recommendations#
For large-scale data processing, it is recommended to use Data-Juicer’s distributed mode
Set batch size appropriately to balance memory usage and processing speed
For more advanced data processing features (synthesis, Data-Model Co-Development), please refer to Data-Juicer documentation