Data-Juicer Q&A Copilot#
Q&A Copilot is the intelligent question-answering component of the Data-Juicer Agents system, a professional Data-Juicer AI assistant built on the AgentScope framework.
You can chat with our Q&A Copilot Juicer on the official documentation site of Data-Juicer! Feel free to ask Juicer anything related to Data-Juicer ecosystem.
Core Components#
Agent: Intelligent Q&A agent based on ReActAgent
FAQ RAG System: Fast and accurate FAQ retrieval powered by Qdrant vector database and DashScope text embedding model
MCP Integration: Online GitHub search capabilities through GitHub MCP Server
Redis Storage: Supports session history and feedback data persistence
Web API: Provides RESTful interfaces for frontend integration
Quick Start#
Prerequisites#
Python >= 3.10
Docker (for running Qdrant vector database)
Redis server (optional - can be disabled with
DISABLE_DATABASE=1)DashScope API Key (for large language model calls and text embedding)
Installation#
Install dependencies
cd .. uv pip install .[qa] cd qa-copilot
Install Docker (for Qdrant vector database)
# Ubuntu/Debian sudo apt-get install docker.io sudo systemctl start docker # macOS brew install docker
Note: The system will automatically check and start the Qdrant Docker container on startup. If FAQ data is not initialized, the system will automatically read from
qa-copilot/rag_utils/faq.txtand initialize the RAG data.Install and start Redis (optional - skip if using
DISABLE_DATABASE=1)# Ubuntu/Debian sudo apt-get install redis-server redis-server --daemonize yes # macOS brew install redis brew services start redis
Note: If you set
DISABLE_DATABASE=1, the system will run in memory-only mode without requiring Redis. Session history will be stored in memory with automatic cleanup after 6 hours of inactivity.
Configuration#
Set environment variables
export DASHSCOPE_API_KEY="your_dashscope_api_key" export GITHUB_TOKEN="your_github_token" # Optional: Disable database (Redis) - run in memory-only mode # export DISABLE_DATABASE=1
Configure FAQ file (optional)
The system uses
qa-copilot/rag_utils/faq.txtas the FAQ data source by default. You can edit this file to customize FAQ content. FAQ file format example:'id': 'FAQ_001', 'question': 'What is Data-Juicer?', 'answer': 'Data-Juicer is a...' 'id': 'FAQ_002', 'question': 'How to install?', 'answer': 'You can install by...'
Start the service
bash setup_server.shOn first startup, the system will automatically:
Check and start the Qdrant Docker container (port 6333)
Initialize FAQ RAG data (if not already initialized)
Start the Web API service
Usage#
Web API Interfaces#
After starting the service, the system provides the following API interfaces:
1. Q&A Conversation#
POST /process
Content-Type: application/json
{
"input": [
{
"role": "user",
"content": [{"type": "text", "text": "How to use Data-Juicer for data cleaning?"}]
}
],
"session_id": "your_session_id",
"user_id": "user_id"
}
2. Get Session History#
POST /memory
Content-Type: application/json
{
"session_id": "your_session_id",
"user_id": "user_id"
}
3. Clear Session History#
POST /clear
Content-Type: application/json
{
"session_id": "your_session_id",
"user_id": "user_id"
}
4. Get Session List#
POST /sessions
Content-Type: application/json
{
"user_id": "user_id"
}
5. Submit User Feedback#
POST /feedback
Content-Type: application/json
{
"data": {
"message_id": "message_id_here",
"feedback_type": "like",
"comment": "optional user comment"
},
"session_id": "your_session_id",
"user_id": "user_id"
}
Parameters:
message_id: The ID of the message to provide feedback on (required)feedback_type: Type of feedback, either"like"or"dislike"(required)comment: Optional user comment text (optional)
Response example:
{
"status": "ok",
"message": "Feedback recorded successfully"
}
WebUI#
you can simply run the following command in your terminal:
npx @agentscope-ai/chat agentscope-runtime-webui --url http://localhost:8080/process
Refer to AgentScope Runtime WebUI for more information.
Configuration Details#
Model Configuration#
In app_deploy.py, you can configure the language model to use:
model=DashScopeChatModel(
"qwen-max", # Model name
api_key=os.getenv("DASHSCOPE_API_KEY"),
stream=True, # Enable streaming response
)
FAQ RAG Configuration#
The FAQ RAG system uses the following configuration:
Vector Database: Qdrant (running in Docker container)
Embedding Model: DashScope text-embedding-v4
Vector Dimension: 1024
Data Source:
qa-copilot/rag_utils/faq.txtStorage Location:
qa-copilot/rag_utils/qdrant_storage
The system automatically checks if RAG data is initialized on startup. If not initialized, it will automatically read the FAQ file and create vector indexes.
Troubleshooting#
Common Issues#
Docker/Qdrant Issues
Ensure Docker service is running:
docker --versionCheck Qdrant container status:
docker ps | grep qdrantManually start Qdrant container:
docker start qdrantCheck if Qdrant port is occupied:
netstat -tlnp | grep 6333To reinitialize RAG data, delete the
qa-copilot/rag_utils/qdrant_storagedirectory and restart the service
Redis connection failure
Ensure Redis service is running:
redis-cli pingCheck if Redis port is occupied:
netstat -tlnp | grep 6379
MCP service startup failure
Ensure
GITHUB_TOKENis correct and exists
API Key error
Verify
DASHSCOPE_API_KEYenvironment variable is correctly setConfirm API Key is valid and has sufficient quota
FAQ retrieval returns no results
Confirm FAQ file
qa-copilot/rag_utils/faq.txtexists and is properly formattedCheck if Qdrant container is running normally
Review logs to confirm RAG data was successfully initialized
Acknowledgments#
Parts of this project’s code are adapted from the following open-source projects:
FAQ RAG System & GitHub MCP Integration: Adapted from the implementation in AgentScope Samples - Alias
Special thanks to the AgentScope team for their excellent framework and sample code!
License#
This project uses the same license as the main project. For details, please refer to the LICENSE file.