Data-Juicer Sphinx Documentation Template#
This is a unified documentation build template designed for the Data-Juicer ecosystem. Built on Sphinx and pydata-sphinx-theme, it provides multi-version and multi-language documentation capabilities, ensuring consistent documentation appearance and user experience across all subprojects.
Features#
Unified Appearance: All subprojects share the same documentation theme and styling.
Multi-Version Support: Automatically builds documentation for multiple Git branches and tags.
Multi-Language Support: Supports both English and Chinese by default.
Ecosystem Interconnectivity: Enables seamless navigation between different project documentations via header external links.
Markdown-Friendly: Automatically discovers and integrates Markdown documents within the project.
Project Structure#
data-juicer-sphinx/
โโโ docs/
โ โโโ sphinx_doc/ # Sphinx documentation build directory
โ โโโ build_versions.py # Multi-version build script (main entry point)
โ โโโ make.bat / Makefile # Build scripts
โ โโโ redirect.html # Redirect page
โ โโโ source/ # Documentation source files
โ โโโ conf.py # Sphinx configuration file
โ โโโ custom_myst.py # Custom MyST extension
โ โโโ external_links.yaml # External project link configuration
โ โโโ index.rst / index_ZH.rst # Home page (customization recommended)
โ โโโ docs_index.rst / docs_index_ZH.rst # Documentation index page (customization recommended)
โ โโโ api.rst # API documentation index (customization recommended)
โ โโโ _static/ # Static assets
โ โ โโโ custom.css # Custom styles
โ โ โโโ images/ # Logos and icons
โ โโโ _templates/ # Custom templates
โ โโโ version-language-switcher.html
โโโ guides/ # Usage guides
โโโ pyproject.toml # Project configuration
โโโ README.md
โโโ README_ZH.md
Quick Start#
Build the simplest English Data-Juicer Sphinx documentation (without API docs):
git clone https://github.com/datajuicer/data-juicer-sphinx.git
uv pip install .
cd docs/sphinx_doc
export PROJECT="data-juicer-sphinx"
python build_versions.py -A -l en
Documentation#
Core Principles#
Isolated Build Environment (Git Worktree)#
Creates an independent Git worktree for each version (branch/tag) at
.worktrees/<version>.Automatically cleans up after building (unless
KEEP_WORKTREES=Trueis set indocs/sphinx_doc/build_versions.py) to avoid polluting the main working directory.
Documentation Content Aggregation#
Automatically scans the entire worktree to collect all
.mdand.rstfiles (excluding directories likeoutputs,sphinx_doc,.github, etc.).Copies these files into a unified Sphinx source directory:
docs/sphinx_doc/source/.(Customized for Data-Juicer operator documentation) For subdirectories under
operators/, automatically generates correspondingindex.rstandindex_ZH.rstfiles to facilitate categorized operator indexing.
Frequently Asked Questions#
Q1: Build fails with โmodule not foundโ error#
A: Ensure all dependencies are installed before building:
uv pip install .
Q2: API documentation isnโt generated#
A: Check the following:
Ensure you didnโt use the
--no-api-docor-AflagsVerify your project contains importable Python modules
Confirm the
CODE_ROOTenvironment variable is correctly set
Q3: External links arenโt displayed#
A:
Verify that
external_links.yamlis configured correctlyEnsure the
PROJECTenvironment variable is properly setCheck the browser console for JavaScript errors
Q4: Chinese documentation links donโt exist#
A: Ensure:
Chinese documentation files end with
_ZH.mdor_ZH.rstindex_ZH.rstexists and is correctly configured
Q5: Page doesnโt exist after switching versions#
A: Documentation structures may differ between versions:
Older versions might lack certain new pages
Version switching attempts to access the same path; if unavailable, it redirects to the homepage
Contribution Guide#
Contributions and improvements to this template are warmly welcomed! โค