Quick Start#
1. Prerequisites#
Python
>=3.10,<3.13Data-Juicer runtime (
py-data-juicer)A DashScope or OpenAI-compatible API key
2. Install#
cd ./data-juicer-agents
uv venv .venv
source .venv/bin/activate
uv pip install -e .
3. Configure model access#
export DASHSCOPE_API_KEY="<your_key>"
# or:
# export MODELSCOPE_API_TOKEN="<your_key>"
# Optional overrides
export DJA_OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export DJA_SESSION_MODEL="qwen3-max-2026-01-23"
export DJA_PLANNER_MODEL="qwen3-max-2026-01-23"
export DJA_MODEL_FALLBACKS="qwen-max,qwen-plus"
export DJA_LLM_THINKING="true"
4. Minimal CLI path#
Optional inspection step:
djx retrieve "remove duplicate text records" \
--dataset ./data/demo-dataset.jsonl \
--top-k 8
Generate a plan:
djx plan "deduplicate and clean text for RAG" \
--dataset ./data/demo-dataset.jsonl \
--export ./data/demo-dataset-processed.jsonl \
--output ./data/demo-plan.yaml
Apply the saved plan:
djx apply --plan ./data/demo-plan.yaml --yes
Dry-run without executing dj-process:
djx apply --plan ./data/demo-plan.yaml --yes --dry-run
Notes:
djx planalready performs internal operator retrieval before building the final plan.djx retrieveis still useful for inspection and debugging.
5. Session mode (dj-agents)#
Default TUI:
dj-agents --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonl
Plain terminal mode:
dj-agents --ui plain --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonl
AgentScope Studio mode:
as_studio
dj-agents --ui as_studio --studio-url http://localhost:3000 --dataset ./data/demo-dataset.jsonl --export ./data/demo-dataset-processed.jsonl
Notes:
dj-agentsrequires LLM access.In session mode, press
Ctrl+Cto interrupt the current turn andCtrl+Dto exit.In
as_studiomode, start AgentScope Studio separately before launchingdj-agents.The session agent usually plans with
inspect_dataset -> retrieve_operators -> build_dataset_spec -> build_process_spec -> build_system_spec -> assemble_plan -> plan_validate -> plan_save.
6. Basic sanity checks#
djx --help
djx retrieve "filter long text" --dataset ./data/demo-dataset.jsonl --json
djx plan "filter long text" --dataset ./data/demo-dataset.jsonl --export ./data/out.jsonl --verbose
djx apply --plan ./data/demo-plan.yaml --yes --dry-run
dj-agents --help
7. Troubleshooting#
If planning or session startup fails with API/model errors, verify:
DASHSCOPE_API_KEYorMODELSCOPE_API_TOKENDJA_OPENAI_BASE_URLDJA_SESSION_MODELandDJA_PLANNER_MODELDJA_MODEL_FALLBACKSwhen you expect model fallbackDJA_LLM_THINKINGif your provider rejects the thinking flag