Tools Architecture#
This document describes the current tool-layer architecture inside data_juicer_agents.
1. Design Goal#
The tool layer is the stable atomic capability surface inside data_juicer_agents.
It serves three consumers:
CLI and command surfaces
the AgentScope-backed
dj-agentssessionfuture external skill packaging
The key rule is:
tool definitions are runtime-agnostic and explicit-input/output
higher layers must not rely on hidden session defaults or tool-internal state fallback
runtime adapters may change transport/schema presentation, but not tool semantics
2. Core Tool Contracts#
Core contracts live in:
data_juicer_agents/core/tool/contracts.pydata_juicer_agents/core/tool/registry.pydata_juicer_agents/core/tool/catalog.py
They define:
ToolSpecToolContextToolResultToolRegistry
Responsibilities:
describe what a tool is
define explicit input and output schemas
register built-in tool specs
avoid direct dependency on AgentScope, TUI, session state, or CLI rendering
3. Tool Groups#
data_juicer_agents/tools/ is organized by tool group.
Each group publishes TOOL_SPECS through registry.py.
Concrete tools usually live under per-tool subdirectories with:
input.py: input modellogic.py: reusable implementationtool.py:ToolSpecbinding
Package-level __init__.py files re-export stable helpers, and some groups keep shared models or validators in sibling modules such as _shared/.
tools/context#
Files:
context/registry.pycontext/inspect_dataset/{input.py,logic.py,tool.py}
Main responsibilities:
dataset inspection
dataset schema probing
tools/retrieve#
Files:
retrieve/registry.pyretrieve/retrieve_operators/{input.py,logic.py,tool.py}retrieve/retrieve_operators/{backend.py,operator_registry.py,catalog.py}
Main responsibilities:
operator retrieval entrypoints for the main package
canonical operator-name resolution
installed-operator lookup
tools/plan#
Files:
plan/registry.pyplan/<tool_name>/{input.py,logic.py,tool.py}plan/_shared/*.py
Main responsibilities:
staged dataset/process/system specs and the final plan model
deterministic planner core
plan validation
explicit plan assembly and persistence helpers
tools/apply#
Files:
apply/registry.pyapply/apply_recipe/{input.py,logic.py,tool.py}
Main responsibilities:
recipe materialization
plan execution
structured execution results
tools/dev#
Files:
dev/registry.pydev/develop_operator/{input.py,logic.py,tool.py,scaffold.py}
Main responsibilities:
custom operator scaffold generation
optional smoke-check
tools/files#
Files:
files/registry.pyfiles/{view_text_file,write_text_file,insert_text_file}/...
Main responsibilities:
read / write / insert text file helpers
tools/process#
Files:
process/registry.pyprocess/{execute_shell_command,execute_python_code}/...
Main responsibilities:
shell execution
python snippet execution
4. Runtime Adapters#
Runtime-specific adaptation is not placed in the tool groups.
AgentScope adapter#
data_juicer_agents/adapters/agentscope/tools.pydata_juicer_agents/adapters/agentscope/schema_utils.py
Responsibilities:
convert
ToolSpecinto AgentScope-compatible callable/schemanormalize JSON schema so agent-facing tool calls stay shallow and explicit
map
ToolResultinto AgentScope responsesapply generic argument preview truncation
Session runtime / toolkit#
data_juicer_agents/capabilities/session/toolkit.pydata_juicer_agents/capabilities/session/runtime.py
Responsibilities:
create the session runtime
emit tool lifecycle events for TUI/CLI observation
choose which registered tools are exposed to
DJSessionAgentkeep session memory observational only; tool semantics remain explicit
5. Default Registry and Session Toolkit#
Built-in tool registration is assembled through:
data_juicer_agents/core/tool/catalog.py
That catalog discovers tool groups under data_juicer_agents/tools/ and loads each group’s TOOL_SPECS (currently via registry.py in every built-in group). It feeds them into:
build_default_tool_registry()
The session toolkit currently uses the default registry directly and orders tools by functional group priority. It does not depend on session tags embedded in tool definitions.
6. Current Session Tool Set#
The default registry currently exposes these tools to the session runtime:
inspect_datasetretrieve_operatorsbuild_dataset_specbuild_process_specbuild_system_specvalidate_dataset_specvalidate_process_specvalidate_system_specassemble_planplan_validateplan_saveapply_recipedevelop_operatorview_text_filewrite_text_fileinsert_text_fileexecute_shell_commandexecute_python_code
These tools stay generic. Session orchestration must call them with explicit arguments based on prior tool outputs.
7. Boundary Summary#
core/tool/*defines tool contracts, discovery, and registrytools/<group>/*defines atomic tools onlyadapters/agentscope/*adapts tools to AgentScope transport/schemacapabilities/session/*orchestrates tools conversationally without changing tool semantics
This is the internal shape that future atomic CLI and skill packaging should build on.