Data-Juicer Q&A Copilot#
Q&A Copilot is the question-answering component of Data-Juicer Agents. It runs as an AgentScope-based web service and answers Data-Juicer ecosystem questions with a combination of LLM reasoning, GitHub MCP retrieval, and operator lookup tools.
You can chat with Juicer on the official Data-Juicer documentation site.
Core Components#
Agent: ReActAgent-based Q&A service
GitHub MCP Integration:
search_repositories,search_code, andget_file_contentsOperator Tools:
retrieve_operators_api(llm mode) andget_operator_infoSession Storage: JSON-based storage by default, Redis optional
Web API: REST endpoints for chat, memory, clear, and feedback
Quick Start#
Prerequisites#
Python
>=3.10, <=3.12DashScope API key
GitHub token
Redis server only if you want
SESSION_STORE_TYPE=redis
Installation#
Install dependencies.
cd .. uv pip install '.[copilot]' cd qa-copilot
Export required environment variables.
export DASHSCOPE_API_KEY="your_dashscope_api_key" export GITHUB_TOKEN="your_github_token"
Optional session storage configuration.
export SESSION_STORE_TYPE="json" # or "redis" # JSON mode export SESSION_STORE_DIR="./sessions" export SESSION_TTL_SECONDS="21600" export SESSION_CLEANUP_INTERVAL="1800" # Redis mode export REDIS_HOST="localhost" export REDIS_PORT="6379" export REDIS_DB="0" export REDIS_PASSWORD="" export REDIS_MAX_CONNECTIONS="10"
Optional service configuration.
export DJ_COPILOT_SERVICE_HOST="127.0.0.1" export DJ_COPILOT_SERVICE_PORT="8080" export DJ_COPILOT_ENABLE_LOGGING="true" export DJ_COPILOT_LOG_DIR="./logs" export FASTAPI_CONFIG_PATH="" export SAFE_CHECK_HANDLER_PATH=""
Start the service.
bash setup_server.sh
Runtime Behavior#
Model#
Default model:
qwen3.6-plusTransport: DashScope OpenAI-compatible endpoint
Streaming: enabled
The runtime applies local formatter-based truncation with
OpenAIChatFormatter.Provider-side context window is
1Mtokens; the local formatter conservatively truncates at0.8Mtokens to leave headroom for tokenizer mismatch between DashScope/Qwen serving and the local OpenAI-compatible token counter.
Mounted Tools#
The current QA runtime mounts these tools:
GitHub MCP:
search_repositoriessearch_codeget_file_contents
Operator tools:
retrieve_operators_apiget_operator_info
retrieve_operators_api is wrapped so that QA always uses llm retrieval mode internally.
API#
1. Q&A Conversation#
POST /process
Content-Type: application/json
{
"input": [
{
"role": "user",
"content": [{"type": "text", "text": "How do I use Data-Juicer for data cleaning?"}]
}
],
"session_id": "your_session_id",
"user_id": "user_id"
}
2. Get Session History#
POST /memory
Content-Type: application/json
{
"session_id": "your_session_id",
"user_id": "user_id"
}
3. Clear Session History#
POST /clear
Content-Type: application/json
{
"session_id": "your_session_id",
"user_id": "user_id"
}
4. Submit User Feedback#
POST /feedback
Content-Type: application/json
{
"data": {
"message_id": "message_id_here",
"feedback_type": "like",
"comment": "optional user comment"
},
"session_id": "your_session_id",
"user_id": "user_id"
}
Feedback parameters:
message_id: target message idfeedback_type:likeordislikecomment: optional free-form comment
WebUI#
You can launch the Runtime WebUI with:
npx @agentscope-ai/chat agentscope-runtime-webui --url http://localhost:8080/process
If you change DJ_COPILOT_SERVICE_PORT, update the WebUI URL accordingly.
See AgentScope Runtime WebUI for more details.
Environment Variables#
JSON session settings only apply when SESSION_STORE_TYPE=json. Redis settings only apply when SESSION_STORE_TYPE=redis.
Variable |
Required |
Default |
Description |
|---|---|---|---|
|
â Yes |
- |
DashScope API key |
|
â Yes |
- |
GitHub token for MCP integration |
|
â No |
|
Session storage type: |
|
â No |
|
Session file directory in JSON mode |
|
â No |
|
Session TTL in JSON mode |
|
â No |
|
Cleanup interval in JSON mode |
|
â No |
|
Redis host in Redis mode |
|
â No |
|
Redis port in Redis mode |
|
â No |
|
Redis database number |
|
â No |
unset |
Redis password |
|
â No |
|
Redis max connections |
|
â No |
|
Service host |
|
â No |
|
Service port |
|
â No |
|
Enable session logging |
|
â No |
|
Log directory. If unset, logs are written under the |
|
â No |
|
Optional FastAPI config JSON file |
|
â No |
|
Optional safe-check handler module |
Troubleshooting#
Common Issues#
Redis connection failure in
SESSION_STORE_TYPE=redisCheck
redis-cli pingVerify
REDIS_HOST,REDIS_PORT,REDIS_DB, andREDIS_PASSWORD
MCP startup failure
Ensure
GITHUB_TOKENis exportedConfirm the token has the required access for GitHub MCP
DashScope authentication or quota failure
Verify
DASHSCOPE_API_KEYCheck Model Studio quota and model availability
Custom config or safe-check handler not loading
Verify
FASTAPI_CONFIG_PATHpoints to a valid JSON fileVerify
SAFE_CHECK_HANDLER_PATHpoints to an importable Python module
Acknowledgments#
Parts of the service scaffolding and MCP integration were adapted from AgentScope Samples - Alias.
License#
This project uses the same license as the main project. See LICENSE for details.