data_juicer.analysis.column_wise_analysis module#
- data_juicer.analysis.column_wise_analysis.get_row_col(total_num, factor=2)[source]#
Given the total number of stats figures, get the โbestโ number of rows and columns. This function is needed when we need to store all stats figures into one image.
- Parameters:
total_num โ Total number of stats figures
factor โ Number of sub-figure types in each figure. In default, itโs 2, which means there are histogram and box plot for each stat figure
- Returns:
โbestโ number of rows and columns, and the grid list
- class data_juicer.analysis.column_wise_analysis.ColumnWiseAnalysis(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[source]#
Bases:
objectApply analysis on each column of stats respectively.
- __init__(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[source]#
Initialization method
- Parameters:
dataset โ the dataset to be analyzed
output_path โ path to store the analysis results
overall_result โ optional precomputed overall stats result
save_stats_in_one_file โ whether save all analysis figures of all stats into one image file
- analyze(show_percentiles=False, show=False, skip_export=False)[source]#
Apply analysis and draw the analysis figure for stats.
- Parameters:
show_percentiles โ whether to show the percentile line in each sub-figure. If itโs true, there will be several red lines to indicate the quantiles of the stats distributions
show โ whether to show in a single window after drawing
skip_export โ whether save the results into disk
- Returns:
- draw_hist(ax, data, save_path, percentiles=None, show=False)[source]#
Draw the histogram for the data.
- Parameters:
ax โ the axes to draw
data โ data to draw
save_path โ the path to save the histogram figure
percentiles โ the overall analysis result of the data including percentile information
show โ whether to show in a single window after drawing
- Returns:
- draw_box(ax, data, save_path, percentiles=None, show=False)[source]#
Draw the box plot for the data.
- Parameters:
ax โ the axes to draw
data โ data to draw
save_path โ the path to save the box figure
percentiles โ the overall analysis result of the data including percentile information
show โ whether to show in a single window after drawing
- Returns: