57 lines
3.3 KiB
Markdown
57 lines
3.3 KiB
Markdown
# 504 Gateway Time-out Fix Report
|
||
|
||
## Problem
|
||
After uploading a document, the frontend receives a "504 Gateway Time-out" error when trying to load documents at `/documents/paginated`. The error originates from nginx reverse proxy timing out after 60 seconds while waiting for the backend response.
|
||
|
||
## Root Cause Analysis
|
||
1. **First paginated request after upload takes >60 seconds** (observed 80 seconds).
|
||
2. Subsequent paginated requests are fast (~2 seconds).
|
||
3. The slowness is due to lock contention in the document status storage (JSON implementation) where the paginated endpoint waits for a lock held by OCR processing.
|
||
4. OCR processing (PaddleOCR) holds the lock for an extended period while extracting text from PDFs/Word documents.
|
||
5. The nginx proxy timeout is set to 60 seconds (default), causing a 504 error when the backend takes longer.
|
||
|
||
## Implemented Fixes
|
||
|
||
### 1. Increased Server Timeout
|
||
- Modified `LightRAG-main/start_server.py` to include `--timeout 600` argument, increasing gunicorn worker timeout to 600 seconds.
|
||
- This ensures the backend does not kill the worker before nginx times out.
|
||
|
||
### 2. Recommendations for Nginx Configuration
|
||
If you have control over the nginx reverse proxy, increase the proxy timeout settings:
|
||
```nginx
|
||
location / {
|
||
proxy_pass http://localhost:3015;
|
||
proxy_read_timeout 300s;
|
||
proxy_connect_timeout 300s;
|
||
proxy_send_timeout 300s;
|
||
}
|
||
```
|
||
|
||
### 3. Workarounds for Frontend
|
||
- **Polling**: After uploading a document, wait for processing to complete before calling `/documents/paginated`. Use `/documents/pipeline_status` endpoint to check if `busy` is false.
|
||
- **Exponential Backoff**: Implement client-side retry with increasing delays (e.g., 5s, 10s, 20s) when a 504 error occurs.
|
||
- **Optimize Upload**: Upload smaller documents or pre-extract text to reduce OCR processing time.
|
||
|
||
### 4. Potential Code Improvements (Future)
|
||
- Reduce lock contention in `JsonDocStatusStorage` by using read‑write locks or copying data snapshot.
|
||
- Make paginated endpoint non‑blocking by using a copy of the data without holding the lock for the entire iteration.
|
||
- Optimize OCR processing to release locks between pages.
|
||
|
||
## Verification
|
||
- After restarting the server with increased timeout, the paginated endpoint responds within 2 seconds when no background processing is active.
|
||
- The first request after upload still times out if OCR is still processing; the workarounds above mitigate this.
|
||
|
||
## Next Steps
|
||
1. Monitor server logs for any further timeout issues.
|
||
2. Consider upgrading to a more performant storage backend (PostgreSQL, MongoDB) if document count grows.
|
||
3. If the issue persists, consider implementing asynchronous paginated responses (return job ID, poll for results).
|
||
|
||
## Files Modified
|
||
- `LightRAG-main/start_server.py` – added `--timeout 600`
|
||
|
||
## Created Test Scripts
|
||
- `test_paginated_performance.py` – measures paginated endpoint response times
|
||
- `test_paginated_now.py` – quick verification
|
||
|
||
## Conclusion
|
||
The immediate fix (increased server timeout) prevents the server from terminating long‑running requests, but the nginx timeout remains a bottleneck. The recommended solution is to increase nginx proxy timeout and implement client‑side retry logic. For production deployments, consider optimizing the lock contention in the storage layer. |