System Performance: Application Analysis (Chapter 5 Reference)

Table of Contents

System Performance Reference - This article is part of a series.

Part 1: This Article

Application-level analysis is the most critical stage of performance engineering because the application is where the “work” begins. This post serves as a quick-reference guide to the methodologies, synchronization primitives, and observability tools found in Chapter 5 of System Performance.

The Performance Mantras
#

When optimizing, follow these in order. The fastest way to finish a task is to never start it.

Don’t do it. (Eliminate unnecessary work)
Do it, but don’t do it again. (Caching)
Do it less. (Batching/Frequency reduction)
Do it later. (Asynchronous processing)
Do it when they’re not looking. (Background/Idle processing)
Do it concurrently. (Parallelism)
Do it cheaper. (Algorithm/Hardware optimization)

Core Methodologies
#

The USE Method
#

For every resource (CPU, Memory, Disk), check:

Utilization: How busy is the resource?
Saturation: Is there a queue of work waiting?
Errors: Are there explicit error counts (logs/counters)?

Workload Characterization
#

Identify the nature of the load:

Who: PID, User, Remote IP.
Why: Code path, API endpoint, Database query.
What: Throughput (Ops/sec), Data size (Bytes), Latency (ms).

Synchronization Primitives
#

These “traffic lights” manage access to shared memory. Choosing the wrong one causes Lock Contention.

Primitive	Behavior	Use Case
Mutex	Sleeps (Blocks) off-CPU while waiting.	Long operations; process context.
Spinlock	Spins (Busy-waits) on-CPU in a loop.	Very fast operations; Interrupt handlers.
RW Lock	Allows multiple readers OR one writer.	Data read often but modified rarely.
Semaphore	A counter allowing N parallel ops.	Managing pools of resources.

Hashed Locks (The Middle Ground)
#

Instead of one Global Lock (slow) or a lock for Every Object (high memory overhead), use a Hash Table of Locks.

Mechanism: LockIndex = ObjectAddress % NumberOfLocks
Benefit: Reduces contention while keeping memory usage fixed and predictable.

Observability Tools & Commands
#

Use these tools to gather data without (usually) stopping the application.

1. Basic Process Counters
#

Tool	Command	Description
uptime	`uptime`	Checks system load averages (1, 5, 15 min).
pidstat	`pidstat 1`	Per-process CPU usage every second.
pidstat I/O	`pidstat -d 1`	Identifies which process is hogging the Disk.
ps	`ps -eo pid,ppid,cmd,%cpu,%mem`	Detailed process tree and resource consumption.

2. Interface Tracing (Syscalls & Libraries)
#

strace can slow down an application by 10x or more. Use it for debugging, not high-load production monitoring.

System Call Summary:

# See which syscalls are most frequent/slowest
strace -cp <PID>

Library Call Tracing:

# See calls to shared libraries like malloc() or strlen()
ltrace -p <PID>

3. Profiling (Sampling)
#

Profilers take snapshots of the CPU state at regular intervals.

The 99Hz Rule: Always sample at an “odd” frequency (e.g., 99Hz or 49Hz) instead of 100Hz. This prevents the profiler from syncing up with internal timers, which causes biased data.

CPU Profiling with perf:

# Record CPU stack traces for 60 seconds at 99Hz
perf record -F 99 -p <PID> -g -- sleep 60

# View the results in-terminal
perf report -n --stdio

4. Advanced BPF Tools (Zero Overhead)
#

These tools use the Berkeley Packet Filter (BPF) for near-zero overhead observability.

Tool	Command	What it shows
opensnoop	`opensnoop`	Real-time file opens (shows filenames and latency).
execsnoop	`execsnoop`	Shows every new process as it is created.
ext4slower	`ext4slower 1`	Lists Disk I/O slower than 1ms.

Key Analysis Tips
#

Off-CPU Analysis: If the app is slow but CPU is at 0%, it is likely blocked on a lock, disk, or network. Traditional profilers won’t show this—you need Off-CPU tracing.
The Streetlight Effect: Don’t look at top just because it’s easy. If the bottleneck is I/O, top won’t help you. Follow the data, not the tool.
Instruction Pointers: A snapshot of the Instruction Pointer tells you what is running. A Stack Trace tells you why it was called.

Reference: Gregg, B. (2020). System Performance: Enterprise and the Cloud. 2nd Edition.

System Performance Reference - This article is part of a series.

Part 1: This Article

The Performance Mantras #

Core Methodologies #

The USE Method #

Workload Characterization #

Synchronization Primitives #

Hashed Locks (The Middle Ground) #

Observability Tools & Commands #

1. Basic Process Counters #

2. Interface Tracing (Syscalls & Libraries) #

3. Profiling (Sampling) #

4. Advanced BPF Tools (Zero Overhead) #

Key Analysis Tips #