API#
- data_juicer.core
- data_juicer.ops
- data_juicer.ops.filter
AlphanumericFilterAudioDurationFilterAudioNMFSNRFilterAudioSizeFilterAverageLineLengthFilterCharacterRepetitionFilterFlaggedWordFilterImageAestheticsFilterImageAspectRatioFilterImageFaceCountFilterImageFaceRatioFilterImageNSFWFilterImagePairSimilarityFilterImageShapeFilterImageSizeFilterImageSubplotFilterImageTextMatchingFilterImageTextSimilarityFilterImageWatermarkFilterLanguageIDScoreFilterInContextInfluenceFilterInstructionFollowingDifficultyFilterLLMAnalysisFilterLLMQualityScoreFilterLLMPerplexityFilterLLMDifficultyScoreFilterLLMTaskRelevanceFilterMaximumLineLengthFilterPerplexityFilterPhraseGroundingRecallFilterSpecialCharactersFilterSpecifiedFieldFilterSpecifiedNumericFieldFilterStopWordsFilterSuffixFilterTextActionFilterTextEmbdSimilarityFilterTextEntityDependencyFilterTextLengthFilterTextPairSimilarityFilterTokenNumFilterVideoAestheticsFilterVideoAspectRatioFilterVideoDurationFilterVideoFramesTextSimilarityFilterVideoMotionScoreFilterVideoMotionScorePtlflowFilterVideoMotionScoreRaftFilterVideoNSFWFilterVideoOcrAreaRatioFilterVideoResolutionFilterVideoTaggingFromFramesFilterVideoWatermarkFilterWordRepetitionFilterWordsNumFilterGeneralFieldFilter
- data_juicer.ops.mapper
AudioAddGaussianNoiseMapperAudioFFmpegWrappedMapperCalibrateQAMapperCalibrateQueryMapperCalibrateResponseMapperChineseConvertMapperCleanCopyrightMapperCleanEmailMapperCleanHtmlMapperCleanIpMapperCleanLinksMapperDetectCharacterAttributesMapperDetectCharacterLocationsMapperDetectMainCharacterMapperDialogIntentDetectionMapperDialogSentimentDetectionMapperDialogSentimentIntensityMapperDialogTopicDetectionMapperDifference_Area_Generator_MapperDifference_Caption_Generator_MapperDownloadFileMapperExpandMacroMapperExtractEntityAttributeMapperExtractEntityRelationMapperExtractEventMapperExtractKeywordMapperExtractNicknameMapperExtractSupportTextMapperExtractTablesFromHtmlMapperFixUnicodeMapperGenerateQAFromExamplesMapperGenerateQAFromTextMapperHumanPreferenceAnnotationMapperImageBlurMapperImageCaptioningFromGPT4VMapperImageCaptioningMapperImageDetectionYoloMapperImageDiffusionMapperImageMMPoseMapperImageFaceBlurMapperImageRemoveBackgroundMapperImageSAM3DBodyMapperImageSegmentMapperImageTaggingMapperImageTaggingVLMMapperMllmMapperNlpaugEnMapperNlpcdaZhMapperOptimizePromptMapperOptimizeQAMapperOptimizeQueryMapperOptimizeResponseMapperPairPreferenceMapperPunctuationNormalizationMapperPythonFileMapperPythonLambdaMapperQuerySentimentDetectionMapperQueryIntentDetectionMapperQueryTopicDetectionMapperRelationIdentityMapperRemoveBibliographyMapperRemoveCommentsMapperRemoveHeaderMapperRemoveLongWordsMapperRemoveNonChineseCharacterlMapperRemoveRepeatSentencesMapperRemoveSpecificCharsMapperRemoveTableTextMapperRemoveWordsWithIncorrectSubstringsMapperReplaceContentMapperS3DownloadFileMapperS3UploadFileMapperSDXLPrompt2PromptMapperSentenceAugmentationMapperSentenceSplitMapperTextChunkMapperTextTaggingByPromptMapperVggtMapperVideoCameraCalibrationStaticDeepcalibMapperVideoCameraCalibrationStaticMogeMapperVideoCaptioningFromAudioMapperVideoCaptioningFromFramesMapperVideoCaptioningFromSummarizerMapperVideoCaptioningFromVideoMapperVideoCaptioningFromVLMMapperVideoDepthEstimationMapperVideoExtractFramesMapperVideoFFmpegWrappedMapperVideoHandReconstructionHaworMapperVideoHandReconstructionMapperVideoFaceBlurMapperVideoObjectSegmentingMapperVideoRemoveWatermarkMapperVideoResizeAspectRatioMapperVideoResizeResolutionMapperVideoSplitByDurationMapperVideoSplitByKeyFrameMapperVideoSplitBySceneMapperVideoTaggingFromAudioMapperVideoTaggingFromFramesMapperVideoUndistortMapperVideoWholeBodyPoseEstimationMapperWhitespaceNormalizationMapper
- data_juicer.ops.deduplicator
DocumentDeduplicatorDocumentMinhashDeduplicatorDocumentMinhashDeduplicatorWithUidDocumentSimhashDeduplicatorImageDeduplicatorRayBasicDeduplicatorRayDocumentDeduplicatorRayImageDeduplicatorRayVideoDeduplicatorRayBTSMinhashDeduplicatorRayBTSMinhashDeduplicatorWithUidRayBTSMinhashCppDeduplicatorVideoDeduplicator
- data_juicer.ops.selector
- data_juicer.ops.common
- data_juicer.analysis
- data_juicer.config
- data_juicer.format