Data-Juicer Sphinx Documentation Template#

This is a unified documentation build template designed for the Data-Juicer ecosystem. Built on Sphinx and pydata-sphinx-theme, it provides multi-version and multi-language documentation capabilities, ensuring consistent documentation appearance and user experience across all subprojects.

Features#

  • Unified Appearance: All subprojects share the same documentation theme and styling.

  • Multi-Version Support: Automatically builds documentation for multiple Git branches and tags.

  • Multi-Language Support: Supports both English and Chinese by default.

  • Ecosystem Interconnectivity: Enables seamless navigation between different project documentations via header external links.

  • Markdown-Friendly: Automatically discovers and integrates Markdown documents within the project.

Project Structure#

data-juicer-sphinx/
โ”œโ”€โ”€ docs/
โ”‚   โ””โ”€โ”€ sphinx_doc/                              # Sphinx documentation build directory
โ”‚       โ”œโ”€โ”€ build_versions.py                    # Multi-version build script (main entry point)
โ”‚       โ”œโ”€โ”€ make.bat / Makefile                  # Build scripts
โ”‚       โ”œโ”€โ”€ redirect.html                        # Redirect page
โ”‚       โ””โ”€โ”€ source/                              # Documentation source files
โ”‚           โ”œโ”€โ”€ conf.py                          # Sphinx configuration file
โ”‚           โ”œโ”€โ”€ custom_myst.py                   # Custom MyST extension
โ”‚           โ”œโ”€โ”€ external_links.yaml              # External project link configuration
โ”‚           โ”œโ”€โ”€ index.rst / index_ZH.rst               # Home page (customization recommended)
โ”‚           โ”œโ”€โ”€ docs_index.rst / docs_index_ZH.rst     # Documentation index page (customization recommended)
โ”‚           โ”œโ”€โ”€ api.rst                          # API documentation index (customization recommended)
โ”‚           โ”œโ”€โ”€ _static/                         # Static assets
โ”‚           โ”‚   โ”œโ”€โ”€ custom.css                   # Custom styles
โ”‚           โ”‚   โ””โ”€โ”€ images/                      # Logos and icons
โ”‚           โ””โ”€โ”€ _templates/                      # Custom templates
โ”‚               โ””โ”€โ”€ version-language-switcher.html
โ”œโ”€โ”€ guides/                                      # Usage guides
โ”œโ”€โ”€ pyproject.toml                               # Project configuration
โ”œโ”€โ”€ README.md
โ””โ”€โ”€ README_ZH.md

Quick Start#

Build the simplest English Data-Juicer Sphinx documentation (without API docs):

git clone https://github.com/datajuicer/data-juicer-sphinx.git

uv pip install .

cd docs/sphinx_doc
export PROJECT="data-juicer-sphinx"
python build_versions.py -A -l en

Documentation#

Here

Core Principles#

Isolated Build Environment (Git Worktree)#

  • Creates an independent Git worktree for each version (branch/tag) at .worktrees/<version>.

  • Automatically cleans up after building (unless KEEP_WORKTREES=True is set in docs/sphinx_doc/build_versions.py) to avoid polluting the main working directory.

Documentation Content Aggregation#

  • Automatically scans the entire worktree to collect all .md and .rst files (excluding directories like outputs, sphinx_doc, .github, etc.).

  • Copies these files into a unified Sphinx source directory: docs/sphinx_doc/source/.

  • (Customized for Data-Juicer operator documentation) For subdirectories under operators/, automatically generates corresponding index.rst and index_ZH.rst files to facilitate categorized operator indexing.

Frequently Asked Questions#

Q1: Build fails with โ€œmodule not foundโ€ error#

A: Ensure all dependencies are installed before building:

uv pip install .

Q2: API documentation isnโ€™t generated#

A: Check the following:

  • Ensure you didnโ€™t use the --no-api-doc or -A flags

  • Verify your project contains importable Python modules

  • Confirm the CODE_ROOT environment variable is correctly set

Q5: Page doesnโ€™t exist after switching versions#

A: Documentation structures may differ between versions:

  • Older versions might lack certain new pages

  • Version switching attempts to access the same path; if unavailable, it redirects to the homepage

Contribution Guide#

Contributions and improvements to this template are warmly welcomed! โค