Devpilot
Servers

Server Monitoring

Monitor CPU, memory, disk, and network metrics for your servers in real time. Track uptime and per-disk, per-interface detail.

Server Monitoring

Devpilot continuously collects resource metrics from every connected server through its management agent. The monitoring dashboard shows current and historical values for CPU, memory, disk, and network, plus an uptime history — everything you need to spot trends, investigate incidents, and plan capacity.

Accessing the Monitoring Dashboard

Open Servers, select a server, and click Monitoring. The dashboard loads the most recent values and streams updates as new samples arrive.

Monitoring depends on a working SSH connection. When a server is disconnected, new samples stop collecting. Historical data already captured is still available, and collection resumes as soon as the connection is restored.

What Devpilot Collects

The agent samples the following metrics and stores them per-server:

CPU

The agent tracks:

  • CPU usage — Current CPU utilization as a percentage.
  • CPU cores — Number of logical cores exposed to the server.
  • Per-core breakdown — Usage per core so you can see whether load is distributed or pinned to one core.
  • Load average — 1-minute, 5-minute, and 15-minute averages, which include processes waiting on I/O.

What to watch for

  • Sustained CPU usage over 80% means the server is under real load. Optimize hot code paths, scale horizontally, or move to a larger instance type.
  • Load averages rising while CPU% stays modest usually point to I/O saturation rather than CPU saturation.
  • Near-zero CPU on a production server may indicate the app is idle, stuck, or not receiving traffic.

Memory

The agent also tracks:

  • Memory total / used / free / available — Raw values in addition to the usage percentage.
  • Usage % — Percentage of total RAM in use.

What to watch for

  • Available memory dropping below ~10% of total puts the server at risk of the Linux OOM killer terminating processes.
  • Steady upward creep over hours or days often indicates a memory leak in an application.
  • Large short-lived spikes during deployments or batch jobs are usually normal.

Disk

Per mount point:

  • Filesystem and mount point — For example /dev/sda1 mounted at /.
  • Total, used, and available space.
  • Usage % — Per-mount utilization.

What to watch for

  • Usage above 85% on any mount point deserves attention. A full root disk will destabilize the server — databases can corrupt, logging stops, SSH sessions can fail.
  • Rapid growth typically comes from logs, database files, or user-uploaded content. Put log rotation in place and archive old data before you run out of space.

A root partition that hits 100% can make the server unresponsive. Set up alerting well before that point.

Network

Per interface:

  • Bytes sent / received and packets sent / received — Cumulative counters.
  • Upload and download rate — Current throughput.
  • Errors and drops — Per-direction error and drop counts, useful for diagnosing flaky links or saturated NICs.

What to watch for

  • Unexpected traffic surges may be legitimate (an announcement, a scraper) or abusive (a DDoS, a misconfigured retry loop).
  • Rising error or drop counters suggest a hardware or driver issue, or a saturated link.
  • High outbound traffic directly drives cloud egress costs on most providers.

Uptime Tracking

Devpilot records a rolling uptime history, which includes:

  • Online status — Whether the server was reachable during each probe.
  • Uptime — The OS-reported uptime captured during the probe.
  • Response time — How long it took the server to respond.
  • Status message — Context when a probe failed.

The monitoring dashboard surfaces these as:

  • Current uptime — A human-readable duration.
  • Status history — A timeline showing connected vs disconnected periods across the selected time range.

Devpilot's uptime tracking measures reachability from Devpilot to your server over SSH. It isn't a substitute for an HTTP-level synthetic monitor if you need to verify a specific public endpoint is answering correctly.

Time Range Selection

Switch between time ranges at the top of the dashboard:

RangeUseful for
Last 1 hourReal-time troubleshooting right after a change.
Last 6 hoursObserving a work session, deployment, or incident window.
Last 24 hoursA full day cycle including peak and off-peak.
Last 7 daysWeekly patterns — weekday vs weekend, business hours vs overnight.
Last 30 daysCapacity planning and longer-term trend analysis.

Resolution adjusts automatically — shorter ranges are more granular; longer ranges aggregate.

Collection Behavior

  • Where it runs — The Devpilot agent installed on the server emits samples. Agent version and last heartbeat are shown on the server's Agent panel.
  • Overhead — The agent is intentionally lightweight; it should not meaningfully impact your workloads.
  • Resilience — If SSH drops, no new samples are stored until reconnection. Gaps show up as breaks on the charts, and you can see them in the uptime history.

Practical Scenarios

Post-deployment check

After deploying, watch the dashboard for 15–30 minutes:

  • Did CPU or memory spike beyond baseline?
  • Is memory creeping up instead of leveling off (possible leak)?
  • Is network throughput out of proportion to expected traffic?

Capacity planning

Review 7- and 30-day trends. If any resource sits consistently above 70% during peak hours, plan to scale before it turns into a firefight.

Investigating slowness

When users report slowness, the monitoring dashboard is the first stop. High CPU points to compute bottlenecks; high memory with swap pressure points to memory pressure; heavy disk IO suggests the database is hitting disk instead of cache; rising network errors point to link issues. Narrow the cause here before digging into logs.