# Data-Juicer Q&A Copilot Q&A Copilot is the question-answering component of Data-Juicer Agents. It runs as an AgentScope-based web service and answers Data-Juicer ecosystem questions with a combination of LLM reasoning, GitHub MCP retrieval, and operator lookup tools. You can chat with ***Juicer*** on the official [Data-Juicer documentation site](https://datajuicer.github.io/data-juicer/en/main/index.html).
## Core Components - **Agent**: ReActAgent-based Q&A service - **GitHub MCP Integration**: `search_repositories`, `search_code`, and `get_file_contents` - **Operator Tools**: `retrieve_operators_api` (llm mode) and `get_operator_info` - **Session Storage**: JSON-based storage by default, Redis optional - **Web API**: REST endpoints for chat, memory, clear, and feedback ## Quick Start ### Prerequisites - Python `>=3.10, <=3.12` - DashScope API key - GitHub token - Redis server only if you want `SESSION_STORE_TYPE=redis` ### Installation 1. Install dependencies. ```bash cd .. uv pip install '.[copilot]' cd qa-copilot ``` 2. Export required environment variables. ```bash export DASHSCOPE_API_KEY="your_dashscope_api_key" export GITHUB_TOKEN="your_github_token" ``` 3. Optional session storage configuration. ```bash export SESSION_STORE_TYPE="json" # or "redis" # JSON mode export SESSION_STORE_DIR="./sessions" export SESSION_TTL_SECONDS="21600" export SESSION_CLEANUP_INTERVAL="1800" # Redis mode export REDIS_HOST="localhost" export REDIS_PORT="6379" export REDIS_DB="0" export REDIS_PASSWORD="" export REDIS_MAX_CONNECTIONS="10" ``` 4. Optional service configuration. ```bash export DJ_COPILOT_SERVICE_HOST="127.0.0.1" export DJ_COPILOT_SERVICE_PORT="8080" export DJ_COPILOT_ENABLE_LOGGING="true" export DJ_COPILOT_LOG_DIR="./logs" export FASTAPI_CONFIG_PATH="" export SAFE_CHECK_HANDLER_PATH="" ``` 5. Start the service. ```bash bash setup_server.sh ``` ## Runtime Behavior ### Model - Default model: `qwen3.6-plus` - Transport: DashScope OpenAI-compatible endpoint - Streaming: enabled - The runtime applies local formatter-based truncation with `OpenAIChatFormatter`. - Provider-side context window is `1M` tokens; the local formatter conservatively truncates at `0.8M` tokens to leave headroom for tokenizer mismatch between DashScope/Qwen serving and the local OpenAI-compatible token counter. ### Mounted Tools The current QA runtime mounts these tools: - GitHub MCP: - `search_repositories` - `search_code` - `get_file_contents` - Operator tools: - `retrieve_operators_api` - `get_operator_info` `retrieve_operators_api` is wrapped so that QA always uses `llm` retrieval mode internally. ## API ### 1. Q&A Conversation ```http POST /process Content-Type: application/json { "input": [ { "role": "user", "content": [{"type": "text", "text": "How do I use Data-Juicer for data cleaning?"}] } ], "session_id": "your_session_id", "user_id": "user_id" } ``` ### 2. Get Session History ```http POST /memory Content-Type: application/json { "session_id": "your_session_id", "user_id": "user_id" } ``` ### 3. Clear Session History ```http POST /clear Content-Type: application/json { "session_id": "your_session_id", "user_id": "user_id" } ``` ### 4. Submit User Feedback ```http POST /feedback Content-Type: application/json { "data": { "message_id": "message_id_here", "feedback_type": "like", "comment": "optional user comment" }, "session_id": "your_session_id", "user_id": "user_id" } ``` Feedback parameters: - `message_id`: target message id - `feedback_type`: `like` or `dislike` - `comment`: optional free-form comment ## WebUI You can launch the Runtime WebUI with: ```bash npx @agentscope-ai/chat agentscope-runtime-webui --url http://localhost:8080/process ``` If you change `DJ_COPILOT_SERVICE_PORT`, update the WebUI URL accordingly. See [AgentScope Runtime WebUI](https://runtime.agentscope.io/en/webui.html#method-2-quick-start-via-npx) for more details. ## Environment Variables JSON session settings only apply when `SESSION_STORE_TYPE=json`. Redis settings only apply when `SESSION_STORE_TYPE=redis`. | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `DASHSCOPE_API_KEY` | ✅ Yes | - | DashScope API key | | `GITHUB_TOKEN` | ✅ Yes | - | GitHub token for MCP integration | | `SESSION_STORE_TYPE` | ❌ No | `"json"` | Session storage type: `"json"` or `"redis"` | | `SESSION_STORE_DIR` | ❌ No | `"./sessions"` | Session file directory in JSON mode | | `SESSION_TTL_SECONDS` | ❌ No | `21600` | Session TTL in JSON mode | | `SESSION_CLEANUP_INTERVAL` | ❌ No | `1800` | Cleanup interval in JSON mode | | `REDIS_HOST` | ❌ No | `"localhost"` | Redis host in Redis mode | | `REDIS_PORT` | ❌ No | `6379` | Redis port in Redis mode | | `REDIS_DB` | ❌ No | `0` | Redis database number | | `REDIS_PASSWORD` | ❌ No | unset | Redis password | | `REDIS_MAX_CONNECTIONS` | ❌ No | `10` | Redis max connections | | `DJ_COPILOT_SERVICE_HOST` | ❌ No | `"127.0.0.1"` | Service host | | `DJ_COPILOT_SERVICE_PORT` | ❌ No | `8080` | Service port | | `DJ_COPILOT_ENABLE_LOGGING` | ❌ No | `"true"` | Enable session logging | | `DJ_COPILOT_LOG_DIR` | ❌ No | `qa-copilot/logs` | Log directory. If unset, logs are written under the `logs` directory next to `session_logger.py` | | `FASTAPI_CONFIG_PATH` | ❌ No | `""` | Optional FastAPI config JSON file | | `SAFE_CHECK_HANDLER_PATH` | ❌ No | `""` | Optional safe-check handler module | ## Troubleshooting ### Common Issues 1. Redis connection failure in `SESSION_STORE_TYPE=redis` - Check `redis-cli ping` - Verify `REDIS_HOST`, `REDIS_PORT`, `REDIS_DB`, and `REDIS_PASSWORD` 2. MCP startup failure - Ensure `GITHUB_TOKEN` is exported - Confirm the token has the required access for GitHub MCP 3. DashScope authentication or quota failure - Verify `DASHSCOPE_API_KEY` - Check Model Studio quota and model availability 4. Custom config or safe-check handler not loading - Verify `FASTAPI_CONFIG_PATH` points to a valid JSON file - Verify `SAFE_CHECK_HANDLER_PATH` points to an importable Python module ## Acknowledgments Parts of the service scaffolding and MCP integration were adapted from [AgentScope Samples - Alias](https://github.com/agentscope-ai/agentscope-samples/tree/main/alias). ## License This project uses the same license as the main project. See [LICENSE](../LICENSE) for details. ## Related Links - [Data-Juicer Official Repository](https://github.com/datajuicer/data-juicer) - [Data-Juicer Agents](https://github.com/datajuicer/data-juicer-agents) - [AgentScope Framework](https://github.com/agentscope-ai/agentscope) - [GitHub MCP Server](https://github.com/github/github-mcp-server)