Data-Juicer-Hub#
Community-driven data-juicer recipes and best practices for various pre-training/fine-tuning tasks.
Documentation#
Detail documentation about the recipes can be found here.
Quick Start#
There are plenty of prepared recipes for data processing on different tasks. You can make use of them by cloning this repo and set the `–config`` with the local path of the target recipe file:
# clone this repo to somewhere on your local machine
git clone https://github.com/datajuicer/data-juicer-hub.git
# run with the actual local path to the target recipe
dj-process --config <root-of-data-juicer-hub>/demo/process.yaml --dataset_path <your-dataset-path>
Contributing#
This is a community-driven repo, so feel free to upload your own recipes to this repo! 😄