API Reference¶
- data_juicer.core
- data_juicer.ops
- data_juicer.ops.filter
AlphanumericFilterAudioDurationFilterAudioNMFSNRFilterAudioSizeFilterAverageLineLengthFilterCharacterRepetitionFilterFlaggedWordFilterImageAestheticsFilterImageAspectRatioFilterImageFaceCountFilterImageFaceRatioFilterImageNSFWFilterImagePairSimilarityFilterImageShapeFilterImageSizeFilterImageTextMatchingFilterImageTextSimilarityFilterImageWatermarkFilterLanguageIDScoreFilterInContextInfluenceFilterInstructionFollowingDifficultyFilterLLMAnalysisFilterLLMQualityScoreFilterLLMPerplexityFilterLLMDifficultyScoreFilterLLMTaskRelevanceFilterMaximumLineLengthFilterPerplexityFilterPhraseGroundingRecallFilterSpecialCharactersFilterSpecifiedFieldFilterSpecifiedNumericFieldFilterStopWordsFilterSuffixFilterTextActionFilterTextEmbdSimilarityFilterTextEntityDependencyFilterTextLengthFilterTextPairSimilarityFilterTokenNumFilterVideoAestheticsFilterVideoAspectRatioFilterVideoDurationFilterVideoFramesTextSimilarityFilterVideoMotionScoreFilterVideoMotionScoreRaftFilterVideoNSFWFilterVideoOcrAreaRatioFilterVideoResolutionFilterVideoTaggingFromFramesFilterVideoWatermarkFilterWordRepetitionFilterWordsNumFilterGeneralFieldFilter
- data_juicer.ops.mapper
AudioAddGaussianNoiseMapperAudioFFmpegWrappedMapperCalibrateQAMapperCalibrateQueryMapperCalibrateResponseMapperChineseConvertMapperCleanCopyrightMapperCleanEmailMapperCleanHtmlMapperCleanIpMapperCleanLinksMapperDetectCharacterAttributesMapperDetectCharacterLocationsMapperDetectMainCharacterMapperDialogIntentDetectionMapperDialogSentimentDetectionMapperDialogSentimentIntensityMapperDialogTopicDetectionMapperDifference_Area_Generator_MapperDifference_Caption_Generator_MapperDownloadFileMapperExpandMacroMapperExtractEntityAttributeMapperExtractEntityRelationMapperExtractEventMapperExtractKeywordMapperExtractNicknameMapperExtractSupportTextMapperExtractTablesFromHtmlMapperFixUnicodeMapperGenerateQAFromExamplesMapperGenerateQAFromTextMapperHumanPreferenceAnnotationMapperImageBlurMapperImageCaptioningFromGPT4VMapperImageCaptioningMapperImageDetectionYoloMapperImageDiffusionMapperImageFaceBlurMapperImageRemoveBackgroundMapperImageSegmentMapperImageTaggingMapperMllmMapperNlpaugEnMapperNlpcdaZhMapperOptimizePromptMapperOptimizeQAMapperOptimizeQueryMapperOptimizeResponseMapperPairPreferenceMapperPunctuationNormalizationMapperPythonFileMapperPythonLambdaMapperQuerySentimentDetectionMapperQueryIntentDetectionMapperQueryTopicDetectionMapperRelationIdentityMapperRemoveBibliographyMapperRemoveCommentsMapperRemoveHeaderMapperRemoveLongWordsMapperRemoveNonChineseCharacterlMapperRemoveRepeatSentencesMapperRemoveSpecificCharsMapperRemoveTableTextMapperRemoveWordsWithIncorrectSubstringsMapperReplaceContentMapperSDXLPrompt2PromptMapperSentenceAugmentationMapperSentenceSplitMapperTextChunkMapperVggtMapperVideoCaptioningFromAudioMapperVideoCaptioningFromFramesMapperVideoCaptioningFromSummarizerMapperVideoCaptioningFromVideoMapperVideoExtractFramesMapperVideoFFmpegWrappedMapperVideoFaceBlurMapperVideoRemoveWatermarkMapperVideoResizeAspectRatioMapperVideoResizeResolutionMapperVideoSplitByDurationMapperVideoSplitByKeyFrameMapperVideoSplitBySceneMapperVideoTaggingFromAudioMapperVideoTaggingFromFramesMapperWhitespaceNormalizationMapper
- data_juicer.ops.deduplicator
- data_juicer.ops.selector
- data_juicer.ops.common
- data_juicer.analysis
- data_juicer.config
- data_juicer.format