Skip to main content
Ctrl+K

Data Juicer

  • DOCS
  • API
  • Sandbox
  • Hub
  • Agents
  • GitHub
English 简体中文
main v1.4.4 v1.4.3 v1.4.2 v1.4.1 v1.4.0
  • DOCS
  • API
  • Sandbox
  • Hub
  • Agents
  • GitHub
English 简体中文
main v1.4.4 v1.4.3 v1.4.2 v1.4.1 v1.4.0

Section Navigation

Tutorial

  • DJ-Cookbook
  • Installation Guide
  • Quick Start

docs

  • Operator Schemas 算子提要
  • Dataset Configuration Guide
  • “Bad” Data Exhibition
  • DJ-SORA
  • DataJuicer-Agent
  • DJ_service
  • How-to Guide for Developers
  • Distributed Data Processing in Data-Juicer
  • Data Recipe Gallery
  • Sandbox
  • Awesome Data-Model Co-Development of MLLMs

operators

  • Aggregator
  • Deduplicator
  • Filter
  • Mapper
  • Formatter
    • csv_formatter
    • empty_formatter
    • json_formatter
    • parquet_formatter
    • ray_empty_formatter
    • text_formatter
    • tsv_formatter
  • Grouper
  • Selector
  • Op

demos

  • Demos
  • Note for dataset path

tools

  • Distributed Fuzzy Deduplication Tools
  • Auto Evaluation Toolkit
  • GPT EVAL: Evaluate your model with OpenAI API
  • Evaluation Results Recorder
  • Format Conversion Tools
  • Multimodal Tools
  • Post Tuning Tools
  • Hyper-parameter Optimization for Data Recipe
  • Label Studio Service Utility
  • Metrics for video generation
  • VBench metrics
  • Postprocess tools
  • Preprocess Tools
  • Data Scoring

thirdparty

  • LLM Ecosystems
  • Third-party Model Library
  • DOCS
  • Formatter

Formatter#

  • csv_formatter
  • empty_formatter
  • json_formatter
  • parquet_formatter
  • ray_empty_formatter
  • text_formatter
  • tsv_formatter

previous

whitespace_normalization_mapper

next

csv_formatter

This Page

  • Show Source

© Copyright 2024, Data-Juicer Team.

Created using Sphinx 8.2.3.

Built with the PyData Sphinx Theme 0.16.1.