# Notebooks Tutorial Welcome to the Notebooks Tutorial Library of Data Juicer Hub! [This branch](https://github.com/datajuicer/data-juicer-hub/tree/notebook) contains a series of Jupyter notebooks to help you get started with Data Juicer quickly. ## 🚀 Quick Start In addition to the methods below, you can also try [JupyterLab Playground with Tutorials](http://8.138.149.181/). ### Method 1: Github Codespace 1. **Launch Codespace** - Click the `Code` button in this repository and select the `Codespaces` tab - Click `Create codespace on notebook` (the `+` icon) to start the environment - Wait a moment and you'll see the VSCode Web interface 2. **Select and Run Notebooks** - Find the `notebooks` folder in the left file directory - Click on the notebook file you're interested in - Click the kernel selector in the top right corner and choose the **`data-juicer-hub`** environment (located in the `.venv` directory) - Start running! ### Method 2: Google Colab Click the links below to run the tutorials online in Google Colab: | Chapter | Title | Colab Link | |---------|-------|-----------| | 01 | Getting Started | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/01_Getting_Started.ipynb) | | 02 | Building Recipes | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/02_Building_Recipes.ipynb) | | 03 | Data Formats and Loading | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/03_Data_Formats_and_Loading.ipynb) | | 04 | DJ Dataset API | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/04_DJ_Dataset_API.ipynb) | | 05 | Operators Usage | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/05_Operators_Usage.ipynb) | | 06 | Analysis and Visualization | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/06_Analysis_and_Visualization.ipynb) | | 07 | Distributed Processing with Ray | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/07_Distributed_Processing_with_Ray.ipynb) | | 08 | Preprocessing | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/08_Preprocessing.ipynb) | | 09 | Multimodal Data Processing | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/09_Multimodal_Data_Processing.ipynb) | | 10 | Advanced Dataset Configuration | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datajuicer/data-juicer-hub/blob/notebook/notebooks/10_Advanced_Dataset_Configuration.ipynb) | # 📝 Tutorial Content Overview - [**01_Getting_Started**](./notebooks/01_Getting_Started.ipynb) - Data Juicer core concepts and quick start guide - [**02_Building_Recipes**](./notebooks/02_Building_Recipes.ipynb) - Design and construction of data processing recipes - [**03_Data_Formats_and_Loading**](./notebooks/03_Data_Formats_and_Loading.ipynb) - Comprehensive guide to data formats and loading methods - [**04_DJ_Dataset_API**](./notebooks/04_DJ_Dataset_API.ipynb) - Complete Dataset API guide - [**05_Operators_Usage**](./notebooks/05_Operators_Usage.ipynb) - Detailed explanation of data processing operators (both YAML and Python modes) - [**06_Analysis_and_Visualization**](./notebooks/06_Analysis_and_Visualization.ipynb) - Data analysis and visualization tools - [**07_Distributed_Processing_with_Ray**](./notebooks/07_Distributed_Processing_with_Ray.ipynb) - Ray distributed processing framework integration - [**08_Preprocessing**](./notebooks/08_Preprocessing.ipynb) - Data preprocessing scripts - [**09_Multimodal_Data_Processing**](./notebooks/09_Multimodal_Data_Processing.ipynb) - Multimodal data processing capabilities - [**10_Advanced_Dataset_Configuration**](./notebooks/10_Advanced_Dataset_Configuration.ipynb) - Advanced Dataset configuration options ## 💡 Recommended Learning Paths **Quick Start** > 01 → 02 → 05 → 06 → 07 > > For: Learners who want to quickly understand and use Data Juicer's core capabilities **Data Integration** > 03 → 08, 09, 10 > > For: Developers who already understand Data Juicer's core capabilities and need to integrate their own data into processing pipelines **Programming and Customization** > 01 → 03 → 04 → 05 → 07 > > For: Engineers who want to flexibly customize data processing pipelines with code ## 📚 More Resources - [Data Juicer Official Documentation](https://datajuicer.github.io/data-juicer/en/main/) - [Data Juicer Hub](https://github.com/datajuicer/data-juicer-hub) Happy Learning! 🎉