Notebooks Tutorial#

Welcome to the Notebooks Tutorial Library of Data Juicer Hub! This branch contains a series of Jupyter notebooks to help you get started with Data Juicer quickly.

๐Ÿš€ Quick Start#

In addition to the methods below, you can also try JupyterLab Playground with Tutorials.

Method 1: Github Codespace#

  1. Launch Codespace

    • Click the Code button in this repository and select the Codespaces tab

    • Click Create codespace on notebook (the + icon) to start the environment

    • Wait a moment and youโ€™ll see the VSCode Web interface

  2. Select and Run Notebooks

    • Find the notebooks folder in the left file directory

    • Click on the notebook file youโ€™re interested in

    • Click the kernel selector in the top right corner and choose the data-juicer-hub environment (located in the .venv directory)

    • Start running!

Method 2: Google Colab#

Click the links below to run the tutorials online in Google Colab:

Chapter

Title

Colab Link

01

Getting Started

Open In Colab

02

Building Recipes

Open In Colab

03

Data Formats and Loading

Open In Colab

04

DJ Dataset API

Open In Colab

05

Operators Usage

Open In Colab

06

Analysis and Visualization

Open In Colab

07

Distributed Processing with Ray

Open In Colab

08

Preprocessing

Open In Colab

09

Multimodal Data Processing

Open In Colab

10

Advanced Dataset Configuration

Open In Colab

๐Ÿ“ Tutorial Content Overview#

๐Ÿ“š More Resources#

Happy Learning! ๐ŸŽ‰