data_juicer.analysis.measure module#
- class data_juicer.analysis.measure.Measure[源代码]#
基类:
objectBase class for Measure distribution.
- name = 'base'#
- class data_juicer.analysis.measure.KLDivMeasure[源代码]#
基类:
MeasureMeasure Kullback-Leibler divergence.
- name = 'kl_divergence'#
- class data_juicer.analysis.measure.JSDivMeasure[源代码]#
基类:
MeasureMeasure Jensen-Shannon divergence.
- name = 'js_divergence'#
- class data_juicer.analysis.measure.CrossEntropyMeasure[源代码]#
基类:
MeasureMeasure Cross-Entropy.
- name = 'cross_entropy'#
- class data_juicer.analysis.measure.EntropyMeasure[源代码]#
基类:
MeasureMeasure Entropy.
- name = 'entropy'#
- class data_juicer.analysis.measure.RelatedTTestMeasure[源代码]#
基类:
MeasureMeasure T-Test for two related distributions on their histogram of the same bins.
Ref: https://en.wikipedia.org/wiki/Student%27s_t-test
For continuous features or distributions, the input could be dataset stats list. For discrete features or distributions, the input could be the tags or the categories list.
- name = 't-test'#
- measure(p, q)[源代码]#
- 参数:
p -- the first feature or distribution. (stats/tags/categories)
q -- the second feature or distribution. (stats/tags/categories)
- 返回:
the T-Test results object -- ([ref](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats._result_classes.TtestResult.html#scipy.stats._result_classes.TtestResult)) # noqa: E501