API#
- data_juicer.core
- data_juicer.ops
- data_juicer.ops.filter
AlphanumericFilterAudioDurationFilterAudioNMFSNRFilterAudioSizeFilterAverageLineLengthFilterCharacterRepetitionFilterFlaggedWordFilterImageAestheticsFilterImageAspectRatioFilterImageFaceCountFilterImageFaceRatioFilterImageNSFWFilterImagePairSimilarityFilterImageShapeFilterImageSizeFilterImageTextMatchingFilterImageTextSimilarityFilterImageWatermarkFilterLanguageIDScoreFilterLLMAnalysisFilterLLMQualityScoreFilterLLMDifficultyScoreFilterMaximumLineLengthFilterPerplexityFilterPhraseGroundingRecallFilterSpecialCharactersFilterSpecifiedFieldFilterSpecifiedNumericFieldFilterStopWordsFilterSuffixFilterTextActionFilterTextEntityDependencyFilterTextLengthFilterTextPairSimilarityFilterTokenNumFilterVideoAestheticsFilterVideoAspectRatioFilterVideoDurationFilterVideoFramesTextSimilarityFilterVideoMotionScoreFilterVideoMotionScoreRaftFilterVideoNSFWFilterVideoOcrAreaRatioFilterVideoResolutionFilterVideoTaggingFromFramesFilterVideoWatermarkFilterWordRepetitionFilterWordsNumFilterGeneralFieldFilter
- data_juicer.ops.mapper
AudioAddGaussianNoiseMapperAudioFFmpegWrappedMapperCalibrateQAMapperCalibrateQueryMapperCalibrateResponseMapperChineseConvertMapperCleanCopyrightMapperCleanEmailMapperCleanHtmlMapperCleanIpMapperCleanLinksMapperDialogIntentDetectionMapperDialogSentimentDetectionMapperDialogSentimentIntensityMapperDialogTopicDetectionMapperDifference_Area_Generator_MapperDifference_Caption_Generator_MapperExpandMacroMapperExtractEntityAttributeMapperExtractEntityRelationMapperExtractEventMapperExtractKeywordMapperExtractNicknameMapperExtractSupportTextMapperExtractTablesFromHtmlMapperFixUnicodeMapperGenerateQAFromExamplesMapperGenerateQAFromTextMapperHumanPreferenceAnnotationMapperImageBlurMapperImageCaptioningFromGPT4VMapperImageCaptioningMapperImageDiffusionMapperImageFaceBlurMapperImageRemoveBackgroundMapperImageSegmentMapperImageTaggingMapperMllmMapperNlpaugEnMapperNlpcdaZhMapperOptimizeQAMapperOptimizeQueryMapperOptimizeResponseMapperPairPreferenceMapperPunctuationNormalizationMapperPythonFileMapperPythonLambdaMapperQuerySentimentDetectionMapperQueryIntentDetectionMapperQueryTopicDetectionMapperRelationIdentityMapperRemoveBibliographyMapperRemoveCommentsMapperRemoveHeaderMapperRemoveLongWordsMapperRemoveNonChineseCharacterlMapperRemoveRepeatSentencesMapperRemoveSpecificCharsMapperRemoveTableTextMapperRemoveWordsWithIncorrectSubstringsMapperReplaceContentMapperSDXLPrompt2PromptMapperSentenceAugmentationMapperSentenceSplitMapperTextChunkMapperVideoCaptioningFromAudioMapperVideoCaptioningFromFramesMapperVideoCaptioningFromSummarizerMapperVideoCaptioningFromVideoMapperVideoExtractFramesMapperVideoFFmpegWrappedMapperVideoFaceBlurMapperVideoRemoveWatermarkMapperVideoResizeAspectRatioMapperVideoResizeResolutionMapperVideoSplitByDurationMapperVideoSplitByKeyFrameMapperVideoSplitBySceneMapperVideoTaggingFromAudioMapperVideoTaggingFromFramesMapperWhitespaceNormalizationMapper
- data_juicer.ops.deduplicator
- data_juicer.ops.selector
- data_juicer.ops.common
- data_juicer.analysis
- data_juicer.config
- data_juicer.format