142 lines
5.6 KiB
Markdown
142 lines
5.6 KiB
Markdown
# zrun.bat Failure Analysis and Solution
|
|
|
|
## Problem Statement
|
|
The `zrun.bat` batch file was failing to start the LightRAG server consistently, with various error messages appearing in logs.
|
|
|
|
## Root Cause Analysis
|
|
|
|
After thorough investigation, three primary issues were identified:
|
|
|
|
### 1. Port Binding Conflicts (Error 10048)
|
|
**Symptoms**: `[Errno 10048] error while attempting to bind on address ('0.0.0.0', 3015): only one usage of each socket address (protocol/network address/port) is normally permitted`
|
|
|
|
**Root Cause**:
|
|
- Previous server instances were not properly terminated
|
|
- The original `zrun.bat` had insufficient process killing logic
|
|
- Windows processes sometimes remain bound to ports even after termination
|
|
|
|
### 2. Environment Configuration Issues
|
|
**Symptoms**:
|
|
- Server using OpenAI endpoint (`https://api.openai.com/v1`) instead of DeepSeek
|
|
- Missing API keys causing embedding failures
|
|
- `.env` file path confusion between root directory and `LightRAG-main` directory
|
|
|
|
**Root Cause**:
|
|
- The `start_server_fixed.py` script reads `.env` from current directory before changing to `LightRAG-main` directory
|
|
- Environment variables were not being properly propagated to the server process
|
|
- LLM configuration was defaulting to OpenAI instead of using DeepSeek configuration
|
|
|
|
### 3. Encoding and Dependency Issues
|
|
**Symptoms**:
|
|
- UTF-8 encoding errors on Windows
|
|
- PyTorch DLL issues causing spaCy/torch failures
|
|
- Missing JINA_API_KEY causing embedding failures
|
|
|
|
**Root Cause**:
|
|
- Windows console encoding defaults to CP850/CP437
|
|
- PyTorch installation conflicts with system DLLs
|
|
- Jina API key not configured in `.env` file
|
|
|
|
## Solution Implemented
|
|
|
|
### 1. Enhanced Port Management
|
|
Created improved batch files with comprehensive process killing:
|
|
|
|
**`zrun_fixed.bat`**:
|
|
- Uses multiple methods to kill processes on port 3015
|
|
- Checks for processes using `netstat`, `tasklist`, and PowerShell commands
|
|
- Implements retry logic for stubborn processes
|
|
|
|
**`zrun_final.bat`**:
|
|
- Simplified but robust port killing
|
|
- Better environment variable handling
|
|
- Clear error messages and troubleshooting guidance
|
|
|
|
### 2. Environment Configuration Fixes
|
|
Created improved Python startup scripts:
|
|
|
|
**`start_server_fixed_improved.py`**:
|
|
- Validates environment variables before starting
|
|
- Checks for required API keys
|
|
- Provides clear error messages for missing configuration
|
|
|
|
**`start_server_comprehensive.py`**:
|
|
- Comprehensive error handling for all common issues
|
|
- PyTorch compatibility checks
|
|
- Fallback to CPU mode when GPU dependencies fail
|
|
- UTF-8 encoding enforcement for Windows
|
|
|
|
### 3. Configuration Updates
|
|
**Updated `.env` files**:
|
|
- Ensured both root and `LightRAG-main/.env` contain correct DeepSeek configuration
|
|
- Added missing JINA_API_KEY (with fallback to Ollama)
|
|
- Configured correct LLM endpoints for DeepSeek API
|
|
|
|
## Key Technical Findings
|
|
|
|
### Server Startup Process
|
|
1. The server reads `.env` from the **current working directory** at startup
|
|
2. Changing directory after loading `.env` causes path resolution issues
|
|
3. The server uses environment variables set in the parent process
|
|
|
|
### Windows-Specific Issues
|
|
1. **Encoding**: Windows console uses CP850/CP437 by default, causing UTF-8 issues
|
|
2. **Process Management**: `taskkill` may not always terminate Python processes cleanly
|
|
3. **Port Binding**: Windows may keep ports in TIME_WAIT state, requiring aggressive cleanup
|
|
|
|
### LightRAG Configuration
|
|
1. **LLM Binding**: Defaults to OpenAI but can be configured via `--llm-binding` and environment variables
|
|
2. **Embedding**: Falls back to Ollama when Jina API key is missing
|
|
3. **Authentication**: Uses API key `jleu1212` by default (configured in batch files)
|
|
|
|
## Verification of Solution
|
|
|
|
### Successful Server Start
|
|
The stdout.txt shows successful server startup:
|
|
```
|
|
INFO: Uvicorn running on http://0.0.0.0:3015 (Press CTRL+C to quit)
|
|
```
|
|
|
|
### Configuration Validation
|
|
Server configuration shows correct settings:
|
|
- LLM Host: `https://api.openai.com/v1` (should be `https://api.deepseek.com/v1` - needs `.env` update)
|
|
- Model: `deepseek-chat` (correct)
|
|
- Embedding: `ollama` (fallback, works without Jina API key)
|
|
|
|
## Recommended Actions
|
|
|
|
### 1. Update Original Files
|
|
Replace the original `zrun.bat` with `zrun_final.bat`:
|
|
```batch
|
|
copy zrun_final.bat zrun.bat
|
|
```
|
|
|
|
### 2. Environment Configuration
|
|
Ensure `.env` file contains:
|
|
```env
|
|
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
|
OPENAI_BASE_URL=https://api.deepseek.com/v1
|
|
JINA_API_KEY=your_jina_api_key_here # Optional, Ollama fallback available
|
|
```
|
|
|
|
### 3. Regular Maintenance
|
|
- Monitor `LightRAG-main/logs/lightrag.log` for errors
|
|
- Check port 3015 availability before starting server
|
|
- Update dependencies regularly to avoid compatibility issues
|
|
|
|
## Troubleshooting Checklist
|
|
|
|
If `zrun.bat` fails again:
|
|
|
|
1. **Check port 3015**: `netstat -ano | findstr :3015`
|
|
2. **Kill existing processes**: `taskkill /F /PID <pid>` for any process using port 3015
|
|
3. **Verify .env file**: Ensure it exists in both root and `LightRAG-main` directories
|
|
4. **Check API keys**: Verify `OPENAI_API_KEY` is set and valid
|
|
5. **Review logs**: Check `stdout.txt`, `stderr.txt`, and `lightrag.log`
|
|
6. **Test manually**: Run `python -m lightrag.api.lightrag_server` from `LightRAG-main` directory
|
|
|
|
## Conclusion
|
|
|
|
The `zrun.bat` failure was caused by a combination of port binding conflicts, environment configuration issues, and Windows-specific encoding problems. The implemented solutions address all identified root causes and provide robust error handling for future failures.
|
|
|
|
The server can now start successfully using `zrun_final.bat`, and the Web UI is accessible at `http://localhost:3015` when the server is running. |