data_juicer_agents.tools.context.inspect_dataset.logic module#

Lightweight dataset probing utilities for planning-time schema inference.

data_juicer_agents.tools.context.inspect_dataset.logic.inspect_dataset_schema(dataset_source=None, sample_size: int = 20) Dict[str, Any][源代码]#

Inspect a small sample of a dataset and infer keys/modality for planning.

Accepts a DatasetSource object that encapsulates the dataset path and config. When dataset_source is None, returns a friendly error dict instead of raising.