# Notebooks 教程 欢迎来到 Data Juicer Hub 的 Notebooks 教程库![本分支](https://github.com/datajuicer/data-juicer-hub/tree/notebook)包含了一系列 Jupyter notebooks,帮助你快速上手 Data Juicer。 ## 🚀 快速开始 除下述方法外,您也可以尝试 [JupyterLab Playground with Tutorials](http://8.138.149.181/)。 ### 方式一:Github Codespace 1. **启动 Codespace** - 点击本仓库的 `Code` 按钮,选择 `Codespaces` 标签 - 点击 `Create codespace on notebook`(`+`号) 启动环境 - 稍等片刻,你将看到 VSCode Web 界面 2. **选择并运行 Notebook** - 在左侧文件目录中找到 `notebooks` 文件夹 - 点击你感兴趣的 notebook 文件 - 点击右上角的内核选择器,选择 **`data-juicer-hub`** 环境(位于 `.venv` 目录) - 开始运行! ### 方式二:Google Colab 点击下方链接,即可在 Google Colab 中在线运行教程: | 章节 | 标题 | Colab 链接 | |------|------|-----------| | 01 | 快速入门 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/01_Getting_Started.ipynb) | | 02 | 构建 Recipes | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/02_Building_Recipes.ipynb) | | 03 | 数据格式与加载 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/03_Data_Formats_and_Loading.ipynb) | | 04 | DJ Dataset API | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/04_DJ_Dataset_API.ipynb) | | 05 | 算子使用详解 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/05_Operators_Usage.ipynb) | | 06 | 分析与可视化 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/06_Analysis_and_Visualization.ipynb) | | 07 | Ray 分布式处理 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/07_Distributed_Processing_with_Ray.ipynb) | | 08 | 数据预处理 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/08_Preprocessing.ipynb) | | 09 | 多模态数据处理 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/09_Multimodal_Data_Processing.ipynb) | | 10 | 高级数据集配置 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/10_Advanced_Dataset_Configuration.ipynb) | # 📝 教程内容概览 - [**01_Getting_Started**](./notebooks/01_Getting_Started.ipynb) - Data Juicer 核心概念与快速入门 - [**02_Building_Recipes**](./notebooks/02_Building_Recipes.ipynb) - 数据处理配方的设计与构建 - [**03_Data_Formats_and_Loading**](./notebooks/03_Data_Formats_and_Loading.ipynb) - 数据格式支持与加载方法详解 - [**04_DJ_Dataset_API**](./notebooks/04_DJ_Dataset_API.ipynb) - Dataset API 完整指南 - [**05_Operators_Usage**](./notebooks/05_Operators_Usage.ipynb) - 数据处理算子详解(YAML与Python两种模式) - [**06_Analysis_and_Visualization**](./notebooks/06_Analysis_and_Visualization.ipynb) - 数据分析与可视化工具 - [**07_Distributed_Processing_with_Ray**](./notebooks/07_Distributed_Processing_with_Ray.ipynb) - Ray 分布式处理框架集成 - [**08_Preprocessing**](./notebooks/08_Preprocessing.ipynb) - 数据预处理脚本 - [**09_Multimodal_Data_Processing**](./notebooks/09_Multimodal_Data_Processing.ipynb) - 多模态数据处理能力 - [**10_Advanced_Dataset_Configuration**](./notebooks/10_Advanced_Dataset_Configuration.ipynb) - Dataset 高级配置选项 ## 💡 推荐学习路径 **快速上手** > 01 → 02 → 05 → 06 → 07 > > 适合:想快速了解与使用 Data Juicer 核心能力的学习者 **数据对接** > 03 → 08、09、10 > > 适合:已经了解 Data Juicer 核心能力,需要将自有数据接入处理流程的开发者 **编程构建** > 01 → 03 → 04 → 05 → 07 > > 适合:希望用代码灵活定制数据处理管道的工程师 ## 📚 更多资源 - [Data Juicer 官方文档](https://datajuicer.github.io/data-juicer/en/main/) - [Data Juicer Hub](https://github.com/datajuicer/data-juicer-hub) Happy Learning! 🎉