data_juicer_agents.core.dj_agent_hooks module#

DataJuicer Agent Hooks

Hook functions for cleaning and processing agent outputs.

data_juicer_agents.core.dj_agent_hooks.clean_log(log_content)[source]#

Clean log content: 1. Extract configuration information (remove table lines) 2. Remove all progress bars 3. Remove duplicate lines 4. Remove data_juicer.ops:timing_context lines

async data_juicer_agents.core.dj_agent_hooks.dj_agent_post_acting_clean_content(self: ReActAgent, *args: Any, **kwargs: Any) None[source]#

Hook function for cleaning messy shell command output after action. Specifically designed to clean DataJuicer processing logs and other shell outputs.

This hook will: 1. Extract configuration information (remove table lines) 2. Remove all progress bars 3. Remove duplicate lines 4. Remove data_juicer.ops:timing_context lines 5. Keep only one Downloading line, replace others with ellipsis

data_juicer_agents.core.dj_agent_hooks.register_dj_agent_hooks(agent)[source]#

Register cleaning hooks for DataJuicer agent.

Parameters:

agent – ReActAgent instance to register hooks for