DOCS#
Tutorial
docs
- Operator Schemas 算子提要
- Dataset Configuration Guide
- “Bad” Data Exhibition
- DJ-SORA
- DataJuicer-Agent
- DJ_service
- How-to Guide for Developers
- Distributed Data Processing in Data-Juicer
- Data Recipe Gallery
- 1. Data-Juicer Minimal Example Recipe
- 2. Reproduce Open Source Text Datasets
- 3. Improved Open Source Pre-training Text Datasets
- 4. Improved Open Source Post-tuning Text Dataset
- 5. Synthetic Contrastive Learning Image-text datasets
- 6. Improved Open Source Image-text datasets
- 7. Basic Example Recipes for Video Data
- 8. Synthesize Human-centric Video Benchmarks
- 9. Improve Existing Open Source Video Datasets
- Sandbox
- Awesome Data-Model Co-Development of MLLMs
tools
- Distributed Fuzzy Deduplication Tools
- Auto Evaluation Toolkit
- GPT EVAL: Evaluate your model with OpenAI API
- Evaluation Results Recorder
- Format Conversion Tools
- Multimodal Tools
- Post Tuning Tools
- Hyper-parameter Optimization for Data Recipe
- Label Studio Service Utility
- Metrics for video generation
- VBench metrics
- Postprocess tools
- Preprocess Tools
- Data Scoring