data_juicer_agents.tools.context#

Context-oriented tools.

class data_juicer_agents.tools.context.InspectDatasetInput(*, dataset_path: str, sample_size: Annotated[int, Ge(ge=1)] = 20)[source]#

Bases: BaseModel

dataset_path: str#
sample_size: int#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class data_juicer_agents.tools.context.ListSystemConfigInput(*, filter_prefix: str | None = None, include_descriptions: bool = True)[source]#

Bases: BaseModel

Input for list_system_config.

This tool lists the complete system configuration from Data-Juicer, including all available parameters, their types, default values, and descriptions. Use this before build_system_spec to discover available configuration options.

filter_prefix: str | None#
include_descriptions: bool#
model_config = {}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

data_juicer_agents.tools.context.inspect_dataset_schema(dataset_path: str, sample_size: int = 20) Dict[str, Any][source]#

Inspect a small sample of a dataset and infer keys/modality for planning.

data_juicer_agents.tools.context.list_system_config(*, filter_prefix: str | None = None, include_descriptions: bool = True) Dict[str, Any][source]#

List system configuration from Data-Juicer.

This function lists all available system configuration parameters from Data-Juicer, including their types, default values, and descriptions.

Parameters:
  • filter_prefix – Optional filter to show only parameters matching this prefix

  • include_descriptions – Whether to include parameter descriptions

Returns:

Dict containing configuration information and available parameters