data_juicer.analysis.column_wise_analysis module#

data_juicer.analysis.column_wise_analysis.get_row_col(total_num, factor=2)[source]#

Given the total number of stats figures, get the โ€œbestโ€ number of rows and columns. This function is needed when we need to store all stats figures into one image.

Parameters:
  • total_num โ€“ Total number of stats figures

  • factor โ€“ Number of sub-figure types in each figure. In default, itโ€™s 2, which means there are histogram and box plot for each stat figure

Returns:

โ€œbestโ€ number of rows and columns, and the grid list

class data_juicer.analysis.column_wise_analysis.ColumnWiseAnalysis(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[source]#

Bases: object

Apply analysis on each column of stats respectively.

__init__(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[source]#

Initialization method

Parameters:
  • dataset โ€“ the dataset to be analyzed

  • output_path โ€“ path to store the analysis results

  • overall_result โ€“ optional precomputed overall stats result

  • save_stats_in_one_file โ€“ whether save all analysis figures of all stats into one image file

analyze(show_percentiles=False, show=False, skip_export=False)[source]#

Apply analysis and draw the analysis figure for stats.

Parameters:
  • show_percentiles โ€“ whether to show the percentile line in each sub-figure. If itโ€™s true, there will be several red lines to indicate the quantiles of the stats distributions

  • show โ€“ whether to show in a single window after drawing

  • skip_export โ€“ whether save the results into disk

Returns:

draw_hist(ax, data, save_path, percentiles=None, show=False)[source]#

Draw the histogram for the data.

Parameters:
  • ax โ€“ the axes to draw

  • data โ€“ data to draw

  • save_path โ€“ the path to save the histogram figure

  • percentiles โ€“ the overall analysis result of the data including percentile information

  • show โ€“ whether to show in a single window after drawing

Returns:

draw_box(ax, data, save_path, percentiles=None, show=False)[source]#

Draw the box plot for the data.

Parameters:
  • ax โ€“ the axes to draw

  • data โ€“ data to draw

  • save_path โ€“ the path to save the box figure

  • percentiles โ€“ the overall analysis result of the data including percentile information

  • show โ€“ whether to show in a single window after drawing

Returns:

draw_wordcloud(ax, data, save_path, show=False)[source]#