# 504 Gateway Time-out Fix Report ## Problem After uploading a document, the frontend receives a "504 Gateway Time-out" error when trying to load documents at `/documents/paginated`. The error originates from nginx reverse proxy timing out after 60 seconds while waiting for the backend response. ## Root Cause Analysis 1. **First paginated request after upload takes >60 seconds** (observed 80 seconds). 2. Subsequent paginated requests are fast (~2 seconds). 3. The slowness is due to lock contention in the document status storage (JSON implementation) where the paginated endpoint waits for a lock held by OCR processing. 4. OCR processing (PaddleOCR) holds the lock for an extended period while extracting text from PDFs/Word documents. 5. The nginx proxy timeout is set to 60 seconds (default), causing a 504 error when the backend takes longer. ## Implemented Fixes ### 1. Increased Server Timeout - Modified `LightRAG-main/start_server.py` to include `--timeout 600` argument, increasing gunicorn worker timeout to 600 seconds. - This ensures the backend does not kill the worker before nginx times out. ### 2. Recommendations for Nginx Configuration If you have control over the nginx reverse proxy, increase the proxy timeout settings: ```nginx location / { proxy_pass http://localhost:3015; proxy_read_timeout 300s; proxy_connect_timeout 300s; proxy_send_timeout 300s; } ``` ### 3. Workarounds for Frontend - **Polling**: After uploading a document, wait for processing to complete before calling `/documents/paginated`. Use `/documents/pipeline_status` endpoint to check if `busy` is false. - **Exponential Backoff**: Implement client-side retry with increasing delays (e.g., 5s, 10s, 20s) when a 504 error occurs. - **Optimize Upload**: Upload smaller documents or pre-extract text to reduce OCR processing time. ### 4. Potential Code Improvements (Future) - Reduce lock contention in `JsonDocStatusStorage` by using read‑write locks or copying data snapshot. - Make paginated endpoint non‑blocking by using a copy of the data without holding the lock for the entire iteration. - Optimize OCR processing to release locks between pages. ## Verification - After restarting the server with increased timeout, the paginated endpoint responds within 2 seconds when no background processing is active. - The first request after upload still times out if OCR is still processing; the workarounds above mitigate this. ## Next Steps 1. Monitor server logs for any further timeout issues. 2. Consider upgrading to a more performant storage backend (PostgreSQL, MongoDB) if document count grows. 3. If the issue persists, consider implementing asynchronous paginated responses (return job ID, poll for results). ## Files Modified - `LightRAG-main/start_server.py` – added `--timeout 600` ## Created Test Scripts - `test_paginated_performance.py` – measures paginated endpoint response times - `test_paginated_now.py` – quick verification ## Conclusion The immediate fix (increased server timeout) prevents the server from terminating long‑running requests, but the nginx timeout remains a bottleneck. The recommended solution is to increase nginx proxy timeout and implement client‑side retry logic. For production deployments, consider optimizing the lock contention in the storage layer.