PHP-FPM (FastCGI Process Manager) is the component responsible for managing PHP worker processes in your application. Its configuration is often copy-pasted without truly being understood and that’s exactly what causes production issues, especially in cloud-native environments like Kubernetes.

In this article, I’ll show you why you need to adapt your PHP-FPM configuration to the underlying infrastructure, and how I simplified scalability management by combining PHP-FPM in static mode with Kubernetes’ Horizontal Pod Autoscaler (HPA).
Understanding PHP-FPM Process Manager Modes
PHP-FPM offers three modes via the pm (process manager) directive:
pm = dynamic (the default)
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
pm.max_requests = 500
PHP-FPM starts with a minimum number of workers and spawns new ones as load increases, up to pm.max_children. When load drops, it kills the excess workers.
Pros:
- Memory-efficient when traffic is low
- Automatically adapts to load
Cons:
- Spawning new workers introduces latency during sudden traffic spikes
- Complex tuning: you need to balance 4 parameters correctly
- Unpredictable behavior under heavy load
pm = ondemand
pm = ondemand
pm.max_children = 50
pm.process_idle_timeout = 10s
Workers are only created on demand and killed after an idle timeout. Even more memory-efficient, but even slower to absorb traffic spikes.
pm = static
pm = static
pm.max_children = 10
A fixed number of workers is started when the container launches and never changes.
The Problem with dynamic in Kubernetes
On a VM, dynamic mode makes sense: resources are shared, and PHP-FPM manages its own memory footprint based on load.
In Kubernetes, the reasoning is different. Each pod has explicitly defined resource requests and limits:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
With pm = dynamic, PHP-FPM will attempt to create up to pm.max_children workers, potentially triggering an OOMKill if memory limits are exceeded, and introducing variable latency depending on whether workers are already warm or not.
Horizontal scaling is the HPA’s job not PHP-FPM’s.
My Approach: static + HPA
The idea is straightforward:
One pod = a fixed number of PHP-FPM workers. When more capacity is needed, Kubernetes creates more pods.
The Memory Contract: memory_limit × pm.max_children
This is where everything comes together. PHP exposes the memory_limit directive which caps the memory each worker can consume:
; php.ini
memory_limit = 128M
The calculation then becomes deterministic:
pm.max_children = available pod memory / memory_limit
Example with a 512Mi pod and ~64Mi system overhead:
(512 - 64) / 128 ≈ 3.5 → rounded down to 3 workers
These three settings form a coherent contract between PHP, PHP-FPM, and Kubernetes:
; php.ini
memory_limit = 128M
; php-fpm.conf
pm = static
pm.max_children = 3
# Kubernetes Deployment
resources:
limits:
memory: "512Mi"
In the worst case (every worker consuming its
memory_limitsimultaneously), the pod uses3 × 128Mi + ~64Mi overhead ≈ 448Misafely under the 512Mi limit. No OOMKill.
The classic mistake is setting pm.max_children without accounting for memory_limit, or worse, leaving memory_limit = -1 (unlimited) thinking it boosts performance. It’s a recipe for OOMKills.
HPA Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
When the average CPU across your pods exceeds 70%, the HPA creates new pods each with its workers already warm and ready to serve traffic.
Why This Is Better
dynamic on K8s | static + HPA | |
|---|---|---|
| Startup latency | Variable (workers spawned on demand) | None (workers ready immediately) |
| Memory predictability | Low | High |
| Tuning complexity | 4 parameters to calibrate | 1 parameter |
| Scalability ownership | PHP-FPM (inside the pod) | Kubernetes (HPA) |
| OOMKill risk | High | Controlled |
The Most Common Mistakes I See
1. Copy-pasting pm.max_children = 50 without thinking
If your memory_limit is 128M, 50 workers = 6.4 GB of potential memory usage. If your pod has 512Mi, that’s an OOMKill waiting to happen.
2. Leaving pm = dynamic by default in Kubernetes
This is the default setting in most PHP Docker images. It was never designed for containers with fixed resource boundaries.
3. Setting memory_limit = -1
Removing the per-process memory limit completely defeats the ability to reason about your pod’s memory usage. You can no longer predict how many workers are safe to run.
4. Not configuring pm.max_requests
Without this directive, PHP workers run indefinitely and accumulate memory leaks. A periodic restart after N requests is good hygiene:
pm.max_requests = 500
5. Ignoring PHP-FPM metrics in the HPA
CPU-based scaling works well for CPU-bound workloads. But if your app is I/O-bound (waiting on DB queries, external APIs), CPU stays low even under heavy load. In that case, exposing PHP-FPM metrics (via pm.status_path) and scaling on active worker count can be more relevant:
pm.status_path = /status
Bonus: Adapt Your Readiness Probe
With pm = static, your workers are ready immediately at startup. Make sure your readiness probe reflects that:
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
No need for a high initialDelaySeconds the pod is operational as soon as PHP-FPM has started.
Conclusion
The rule I apply:
- VM / bare metal →
pm = dynamic, let PHP-FPM manage load - Kubernetes →
pm = static, let the HPA manage scalability
This clear separation of responsibilities makes the system more predictable, easier to monitor, and eliminates an entire category of memory-related bugs.
The key insight is that memory_limit, pm.max_children, and your pod’s memory limit are not independent settings they form a contract. Define one, derive the others.
PHP-FPM configuration isn’t glamorous, but it’s often the difference between an app that holds under load and one that OOMKills at 200 req/s.
