Mastering strace & lsof on Linux — Step‑by‑Step Debugging for Sysadmins & DevOps

When working with Linux servers and applications, silent failures and hard‑to‑trace faults are incredibly frustrating. Standard logs often don’t reveal the root cause of permission issues, network connection failures, port conflicts, or file access problems. That’s where strace and lsof come in — two powerful command‑line tools that let you inspect system calls, file usage, network sockets, and processes in real time.

In this article, we’ll cover:


What Are strace & lsof?

strace is a Linux diagnostic tool that traces system calls made by a process — such as file opens, reads, writes, network connects, and more. It can help reveal why an application fails even when logs are quiet.

lsof stands for list open files. Since “everything is a file” in Linux, lsof shows you open files, directories, network sockets, and pipes used by running processes, which helps identify file locks, port conflicts, and resource issues.


Common Uses & Benefits

Why use strace?

Why use lsof?

Together, strace and lsof help you solve deep‑seated issues faster without guesswork.


Installation

Most Linux distributions include these tools by default. If you don’t have them installed:

Debian / Ubuntu:

sudo apt update
sudo apt install strace lsof -y

Fedora / RHEL / CentOS:

sudo dnf install strace lsof -y

Arch / Manjaro:

sudo pacman -S strace lsof

Using strace — Trace System Calls

Basic Usage

Attach to a running process:

strace -p <PID>

Run a command and trace its system calls:

strace ls -l /var/log

Filter for specific calls (e.g., file opens):

strace -e openat myservice

Example output:

openat(AT_FDCWD, "/etc/myservice/config.yaml", O_RDONLY) = -1 ENOENT (No such file or directory)

This tells you the service failed to find its config file. (back2cloud.com)


Advanced strace Tips


Using lsof — See Open Files & Sockets

Basic Commands

List open files by process ID:

lsof -p <PID>

Find who’s listening on port 8080:

lsof -i :8080

Check deleted files still held open (often fills up disk space):

lsof | grep deleted

Practical Real‑World Examples

Scenario 1 — Web App Hangs Due to Log Access Error

Issue: Web app becomes unresponsive when writing logs.

  1. Find open files and confirm the log is opened:

lsof -p $(pidof webapp)

You see:

webapp 9876 user 5r REG /var/log/webapp.log
  1. Check access errors with strace:

strace -e openat -p $(pidof webapp)

It shows:

openat(..., "/var/log/webapp.log", O_RDONLY) = -1 EACCES (Permission denied)

Fix: Correct file permissions:

chmod 644 /var/log/webapp.log
chown webuser:webgroup /var/log/webapp.log

Problem solved without rebooting. (back2cloud.com)


Scenario 2 — Port Conflict Preventing Service Start

Web service fails to bind to port 9090.

lsof -i :9090

Shows:

nginx 4321 root 6u TCP *:9090 (LISTEN)

Solution — stop or reassign ports:

sudo systemctl stop nginx

Scenario 3 — Database Connection Fails Inside a Microservice

Use lsof to inspect open sockets:

lsof -p $(pidof microservice)

Then trace connection attempts:

strace -e connect -p $(pidof microservice)

You see:

connect(..., sin_port=htons(3306), ...) = -1 ECONNREFUSED (Connection refused)

Meaning the database isn’t reachable — maybe not running or wrong host/port.


Best Practices & Tips

alias debugweb="strace -f -e openat -p $(pidof webapp) | grep /var/log"

Conclusion

strace and lsof are essential troubleshooting tools for Linux admins, DevOps engineers, and sysadmins. They help diagnose elusive system issues — from permission errors to network and file resource problems — with precision and minimal guesswork. Use them before making destructive changes or guessing at logs.

By mastering these utilities, you’ll spend less time debugging and more time improving system reliability.


Real‑World Tips