Data-Juicer Q&A Copilot#

Q&A Copilot is the question-answering component of Data-Juicer Agents. It runs as an AgentScope-based web service and answers Data-Juicer ecosystem questions with a combination of LLM reasoning, GitHub MCP retrieval, and operator lookup tools.

You can chat with Juicer on the official Data-Juicer documentation site.

Core Components#

  • Agent: ReActAgent-based Q&A service

  • GitHub MCP Integration: search_repositories, search_code, and get_file_contents

  • Operator Tools: retrieve_operators_api (llm mode) and get_operator_info

  • Session Storage: JSON-based storage by default, Redis optional

  • Web API: REST endpoints for chat, memory, clear, and feedback

Quick Start#

Prerequisites#

  • Python >=3.10, <=3.12

  • DashScope API key

  • GitHub token

  • Redis server only if you want SESSION_STORE_TYPE=redis

Installation#

  1. Install dependencies.

    cd ..
    uv pip install '.[copilot]'
    cd qa-copilot
    
  2. Export required environment variables.

    export DASHSCOPE_API_KEY="your_dashscope_api_key"
    export GITHUB_TOKEN="your_github_token"
    
  3. Optional session storage configuration.

    export SESSION_STORE_TYPE="json"  # or "redis"
    
    # JSON mode
    export SESSION_STORE_DIR="./sessions"
    export SESSION_TTL_SECONDS="21600"
    export SESSION_CLEANUP_INTERVAL="1800"
    
    # Redis mode
    export REDIS_HOST="localhost"
    export REDIS_PORT="6379"
    export REDIS_DB="0"
    export REDIS_PASSWORD=""
    export REDIS_MAX_CONNECTIONS="10"
    
  4. Optional service configuration.

    export DJ_COPILOT_SERVICE_HOST="127.0.0.1"
    export DJ_COPILOT_SERVICE_PORT="8080"
    export DJ_COPILOT_ENABLE_LOGGING="true"
    export DJ_COPILOT_LOG_DIR="./logs"
    export FASTAPI_CONFIG_PATH=""
    export SAFE_CHECK_HANDLER_PATH=""
    
  5. Start the service.

    bash setup_server.sh
    

Runtime Behavior#

Model#

  • Default model: qwen3.6-plus

  • Transport: DashScope OpenAI-compatible endpoint

  • Streaming: enabled

  • The runtime applies local formatter-based truncation with OpenAIChatFormatter.

  • Provider-side context window is 1M tokens; the local formatter conservatively truncates at 0.8M tokens to leave headroom for tokenizer mismatch between DashScope/Qwen serving and the local OpenAI-compatible token counter.

Mounted Tools#

The current QA runtime mounts these tools:

  • GitHub MCP:

    • search_repositories

    • search_code

    • get_file_contents

  • Operator tools:

    • retrieve_operators_api

    • get_operator_info

retrieve_operators_api is wrapped so that QA always uses llm retrieval mode internally.

API#

1. Q&A Conversation#

POST /process
Content-Type: application/json

{
  "input": [
    {
      "role": "user",
      "content": [{"type": "text", "text": "How do I use Data-Juicer for data cleaning?"}]
    }
  ],
  "session_id": "your_session_id",
  "user_id": "user_id"
}

2. Get Session History#

POST /memory
Content-Type: application/json

{
  "session_id": "your_session_id",
  "user_id": "user_id"
}

3. Clear Session History#

POST /clear
Content-Type: application/json

{
  "session_id": "your_session_id",
  "user_id": "user_id"
}

4. Submit User Feedback#

POST /feedback
Content-Type: application/json

{
  "data": {
    "message_id": "message_id_here",
    "feedback_type": "like",
    "comment": "optional user comment"
  },
  "session_id": "your_session_id",
  "user_id": "user_id"
}

Feedback parameters:

  • message_id: target message id

  • feedback_type: like or dislike

  • comment: optional free-form comment

WebUI#

You can launch the Runtime WebUI with:

npx @agentscope-ai/chat agentscope-runtime-webui --url http://localhost:8080/process

If you change DJ_COPILOT_SERVICE_PORT, update the WebUI URL accordingly.

See AgentScope Runtime WebUI for more details.

Environment Variables#

JSON session settings only apply when SESSION_STORE_TYPE=json. Redis settings only apply when SESSION_STORE_TYPE=redis.

Variable

Required

Default

Description

DASHSCOPE_API_KEY

✅ Yes

-

DashScope API key

GITHUB_TOKEN

✅ Yes

-

GitHub token for MCP integration

SESSION_STORE_TYPE

❌ No

"json"

Session storage type: "json" or "redis"

SESSION_STORE_DIR

❌ No

"./sessions"

Session file directory in JSON mode

SESSION_TTL_SECONDS

❌ No

21600

Session TTL in JSON mode

SESSION_CLEANUP_INTERVAL

❌ No

1800

Cleanup interval in JSON mode

REDIS_HOST

❌ No

"localhost"

Redis host in Redis mode

REDIS_PORT

❌ No

6379

Redis port in Redis mode

REDIS_DB

❌ No

0

Redis database number

REDIS_PASSWORD

❌ No

unset

Redis password

REDIS_MAX_CONNECTIONS

❌ No

10

Redis max connections

DJ_COPILOT_SERVICE_HOST

❌ No

"127.0.0.1"

Service host

DJ_COPILOT_SERVICE_PORT

❌ No

8080

Service port

DJ_COPILOT_ENABLE_LOGGING

❌ No

"true"

Enable session logging

DJ_COPILOT_LOG_DIR

❌ No

qa-copilot/logs

Log directory. If unset, logs are written under the logs directory next to session_logger.py

FASTAPI_CONFIG_PATH

❌ No

""

Optional FastAPI config JSON file

SAFE_CHECK_HANDLER_PATH

❌ No

""

Optional safe-check handler module

Troubleshooting#

Common Issues#

  1. Redis connection failure in SESSION_STORE_TYPE=redis

    • Check redis-cli ping

    • Verify REDIS_HOST, REDIS_PORT, REDIS_DB, and REDIS_PASSWORD

  2. MCP startup failure

    • Ensure GITHUB_TOKEN is exported

    • Confirm the token has the required access for GitHub MCP

  3. DashScope authentication or quota failure

    • Verify DASHSCOPE_API_KEY

    • Check Model Studio quota and model availability

  4. Custom config or safe-check handler not loading

    • Verify FASTAPI_CONFIG_PATH points to a valid JSON file

    • Verify SAFE_CHECK_HANDLER_PATH points to an importable Python module

Acknowledgments#

Parts of the service scaffolding and MCP integration were adapted from AgentScope Samples - Alias.

License#

This project uses the same license as the main project. See LICENSE for details.