In many production environments, service availability depends on external monitoring systems such as Prometheus, Zabbix, or cloud-provider health checks. While effective, these systems introduce additional complexity and dependencies.
A lesser-known but highly effective approach is to leverage systemd’s native watchdog and restart capabilities to create self-healing services that automatically detect failures and recover without external tools.
This document explains how to design and deploy a self-healing systemd service on AlmaLinux 8 using built-in systemd features only.
Critical internal services without monitoring agents
Edge or isolated systems
Minimal installations
Bootstrap environments
Fallback protection when monitoring systems fail
AlmaLinux 8
systemd (default)
Root or sudo access
A long-running service or script
Create a service script that simulates a long-running process.
nano /usr/local/bin/example-service.sh
Add the following content:
#!/bin/bash
while true; do
echo "$(date) - Service heartbeat"
sleep 5
done
Make it executable:
chmod +x /usr/local/bin/example-service.sh
nano /etc/systemd/system/example-selfhealing.service
Add:
[Unit]
Description=Example Self-Healing Service
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/example-service.sh
Restart=always
RestartSec=3
WatchdogSec=10
NotifyAccess=main
[Install]
WantedBy=multi-user.target
Modify the script to notify systemd.
nano /usr/local/bin/example-service.sh
Replace content with:
#!/bin/bash
while true; do
systemd-notify WATCHDOG=1
sleep 5
done
systemctl daemon-reexec
systemctl daemon-reload
systemctl enable example-selfhealing.service
systemctl start example-selfhealing.service
Check service status:
systemctl status example-selfhealing.service
Verify watchdog activity:
journalctl -u example-selfhealing.service
Forcefully terminate the service:
pkill -9 -f example-service.sh
systemd automatically restarts it within seconds.
systemctl status example-selfhealing.service
Restart count increases, confirming recovery.
StartLimitIntervalSec=60
StartLimitBurst=3
MemoryMax=256M
CPUQuota=50%
OnFailure=emergency.target
journalctl -u example-selfhealing.service --since "10 minutes ago"
No external agents required
Faster recovery than monitoring alerts
Reduced system complexity
Built-in to systemd
Highly reliable
This technique is widely used internally by enterprise Linux teams but is rarely documented publicly.
systemd provides powerful native mechanisms for building self-healing services without relying on third-party monitoring tools.
By correctly configuring restart policies and watchdogs, AlmaLinux 8 systems can automatically recover from many service failures with minimal effort.