Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
PR Summary
Added memory monitoring functionality to track and log process memory usage and CPU metrics across the application, with persistent storage and rotation capabilities.
- Added new
celery_worker_monitoringinbackend/supervisord.confto handle dedicated memory monitoring tasks - Implemented
memory_monitoring.pywith rotating file handler (10MB limit, 5 backups) and structured logging format for memory/CPU metrics - Added persistent
log_storevolume mounted at/var/log/persisted-logsacross all Docker Compose configurations - Integrated
emit_process_memoryfunction in indexing tasks to track worker memory usage every 60 seconds - Added memory monitoring to both
api_serverandbackgroundservices in production configuration for comprehensive coverage
10 file(s) reviewed, 6 comment(s)
Edit PR Review Bot Settings | Greptile
|
|
||
| # Create a dedicated logger for memory monitoring | ||
| memory_logger = logging.getLogger("memory_monitoring") | ||
| memory_logger.setLevel(logging.INFO) |
There was a problem hiding this comment.
style: Set propagate=False to prevent duplicate logs if parent logger has handlers
| memory_logger.setLevel(logging.INFO) | |
| memory_logger.setLevel(logging.INFO) | |
| memory_logger.propagate = False |
| # optional, only for debugging purposes | ||
| volumes: | ||
| - log_store:/var/log/persisted-logs |
There was a problem hiding this comment.
style: Volume mount is marked as optional/debugging but is required for memory monitoring to work. Consider removing the 'optional' comment or clarifying when it's needed.
| except Exception as e: | ||
| logger.error(f"Error monitoring worker memory: {e}") |
There was a problem hiding this comment.
style: Generic exception handling masks specific issues like NoSuchProcess or AccessDenied. Consider handling psutil.Error explicitly.
| try: | ||
| process = psutil.Process(pid) | ||
| memory_info = process.memory_info() | ||
| cpu_percent = process.cpu_percent(interval=0.1) |
There was a problem hiding this comment.
style: cpu_percent() with interval=0.1 blocks execution. Consider using cpu_percent() without interval for non-blocking operation.
| volumes: | ||
| - log_store:/var/log/persisted-logs |
There was a problem hiding this comment.
logic: Volume mount is only added to background service, but memory_monitoring.py suggests api_server might also need access to write logs
| volumes: | ||
| - log_store:/var/log/persisted-logs |
There was a problem hiding this comment.
style: Consider adding size limits to the log_store volume to prevent unbounded disk usage
| pid = job.process.pid | ||
| if pid is not None: | ||
| # Only emit memory info once per minute (60 seconds) | ||
| current_time = time.time() |
There was a problem hiding this comment.
time.monotonic preferable in most cases
| # Regular application logger | ||
| logger = setup_logger() | ||
|
|
||
| # Set up a dedicated memory monitoring logger |
There was a problem hiding this comment.
Will there be sufficient information in the log line to distinguish between various indexing connector sources so we can zero in on what failed? Is there any intention to use this from sources other than indexing where we would need more metadata to distinguish those from indexing metrics?
There was a problem hiding this comment.
it should already be possible via the process_name param (+ additional_metadata). I'm not sure if that's what you're referring to?
Description
Fixes https://linear.app/danswer/issue/DAN-1544/add-basic-memory-loggin
How Has This Been Tested?
Ran docker compose locally, verified I saw the logs as expected in the dir I expected + verified that on container restart the logs were persisted.
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.