data_juicer.utils.job.monitor module#

DataJuicer Job Progress Monitor

A utility to monitor and display progress information for DataJuicer jobs. Shows partition status, operation progress, checkpoints, and overall job metrics.

class data_juicer.utils.job.monitor.JobProgressMonitor(job_id: str, base_dir: str = 'outputs/partition-checkpoint-eventlog')[source]#

Bases: object

Monitor and display progress for DataJuicer jobs.

__init__(job_id: str, base_dir: str = 'outputs/partition-checkpoint-eventlog')[source]#

Initialize the job progress monitor.

Parameters:
  • job_id – The job ID to monitor

  • base_dir – Base directory containing job outputs

display_progress(detailed: bool = False)[source]#

Display job progress information.

get_progress_data() Dict[str, Any][source]#

Get progress data as a dictionary for programmatic use.

data_juicer.utils.job.monitor.show_job_progress(job_id: str, base_dir: str = 'outputs/partition-checkpoint-eventlog', detailed: bool = False) Dict[str, Any][source]#

Utility function to show job progress.

Parameters:
  • job_id – The job ID to monitor

  • base_dir – Base directory containing job outputs

  • detailed – Whether to show detailed operation information

Returns:

Dictionary containing all progress data

Example

>>> show_job_progress("20250728_233517_510abf")
>>> show_job_progress("20250728_233517_510abf", detailed=True)
data_juicer.utils.job.monitor.main()[source]#

Main entry point for the job progress monitor.