data_juicer.format.load module#
- data_juicer.format.load.load_formatter(dataset_path, text_keys=None, suffixes=None, add_suffix=False, **kwargs) BaseFormatter[source]#
Load the appropriate formatter for different types of data formats.
- Parameters:
dataset_path â Path to dataset file or dataset directory
text_keys â key names of field that stores sample text. Default: None
suffixes â the suffix of files that will be read. Default: None
add_suffix â whether to add the file suffix to dataset meta. Default: False
- Returns:
a dataset formatter.