data_juicer.utils.job.monitor module#

DataJuicer Job Progress Monitor

A utility to monitor and display progress information for DataJuicer jobs. Shows partition status, operation progress, checkpoints, and overall job metrics.

class data_juicer.utils.job.monitor.JobProgressMonitor(job_id: str, base_dir: str = 'outputs/partition-checkpoint-eventlog')[源代码]#

基类:object

Monitor and display progress for DataJuicer jobs.

__init__(job_id: str, base_dir: str = 'outputs/partition-checkpoint-eventlog')[源代码]#

Initialize the job progress monitor.

参数:
  • job_id -- The job ID to monitor

  • base_dir -- Base directory containing job outputs

display_progress(detailed: bool = False)[源代码]#

Display job progress information.

get_progress_data() Dict[str, Any][源代码]#

Get progress data as a dictionary for programmatic use.

data_juicer.utils.job.monitor.show_job_progress(job_id: str, base_dir: str = 'outputs/partition-checkpoint-eventlog', detailed: bool = False) Dict[str, Any][源代码]#

Utility function to show job progress.

参数:
  • job_id -- The job ID to monitor

  • base_dir -- Base directory containing job outputs

  • detailed -- Whether to show detailed operation information

返回:

Dictionary containing all progress data

示例

>>> show_job_progress("20250728_233517_510abf")
>>> show_job_progress("20250728_233517_510abf", detailed=True)
data_juicer.utils.job.monitor.main()[源代码]#

Main entry point for the job progress monitor.