# BLOOM Config Files The [folder](https://github.com/datajuicer/data-juicer-hub/tree/main/reproduced_bloom) in Data-Juicer-Hub contains example configuration files to easily and quickly reproduce the processing flow of the [ROOTS](https://github.com/bigscience-workshop/data-preparation) dataset, created by the BigScience initiative to train the BLOOM models. ## Oscar The raw data files can be downloaded as described in [BLOOM/Oscar](https://github.com/bigscience-workshop/data-preparation/tree/main/preprocessing/training/01b_oscar_cleaning_and_filtering). Then use [bloom-oscar.yaml](https://github.com/datajuicer/data-juicer-hub/blob/main/reproduced_bloom/bloom-oscar.yaml) to perform the whole processing. An analysis of our reproduction will be published soon.