data_juicer.analysis.measure module#
- class data_juicer.analysis.measure.Measure[source]#
Bases:
objectBase class for Measure distribution.
- name = 'base'#
- class data_juicer.analysis.measure.KLDivMeasure[source]#
Bases:
MeasureMeasure Kullback-Leibler divergence.
- name = 'kl_divergence'#
- class data_juicer.analysis.measure.JSDivMeasure[source]#
Bases:
MeasureMeasure Jensen-Shannon divergence.
- name = 'js_divergence'#
- class data_juicer.analysis.measure.CrossEntropyMeasure[source]#
Bases:
MeasureMeasure Cross-Entropy.
- name = 'cross_entropy'#
- class data_juicer.analysis.measure.EntropyMeasure[source]#
Bases:
MeasureMeasure Entropy.
- name = 'entropy'#
- class data_juicer.analysis.measure.RelatedTTestMeasure[source]#
Bases:
MeasureMeasure T-Test for two related distributions on their histogram of the same bins.
Ref: https://en.wikipedia.org/wiki/Student%27s_t-test
For continuous features or distributions, the input could be dataset stats list. For discrete features or distributions, the input could be the tags or the categories list.
- name = 't-test'#
- measure(p, q)[source]#
- Parameters:
p â the first feature or distribution. (stats/tags/categories)
q â the second feature or distribution. (stats/tags/categories)
- Returns:
the T-Test results object â ([ref](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats._result_classes.TtestResult.html#scipy.stats._result_classes.TtestResult)) # noqa: E501