A production database connection suddenly stopped working. No errors in the application logs, no alerts—just empty query results. Here’s what happened and how to prevent it.

The Symptom

A WordPress application connecting to a remote MySQL database through an internal proxy started returning empty results. Build times dropped significantly because queries were failing fast instead of actually running.

The connection path looked like this:

WordPress → Internal Proxy (10.0.1.x:3306) → Remote DB Endpoint → MySQL Cluster

The Investigation

Direct connection to the remote database worked fine:

mysql -h remote-db-endpoint.example.com -P 3306 -u db_user -p
# Success - connected immediately

But connecting through the local HAProxy failed:

mysql -h 10.0.1.x -P 3306 -u db_user -p
# ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet'

Checking HAProxy status revealed the problem:

systemctl status haproxy
Server aptible_proxysql/aptible is DOWN, reason: Layer4 timeout
backend 'aptible_proxysql' has no server available!

The Root Cause

The remote database endpoint was behind an AWS ELB. When AWS rotated the ELB IP addresses, HAProxy kept trying to connect to the old IPs it had cached at startup.

HAProxy, by default, resolves DNS once when the service starts and caches that IP forever.

The Fix

Immediate: Reload HAProxy

systemctl reload haproxy

This forces HAProxy to re-resolve DNS and pick up the new IPs.

Permanent: Add DNS Resolver Configuration

Update /etc/haproxy/haproxy.cfg:

resolvers dns
    nameserver dns1 127.0.0.53:53
    resolve_retries 3
    timeout resolve 1s
    timeout retry   1s
    hold valid 10s

backend your_backend
    server backend_server your-endpoint.example.com:3306 check resolvers dns init-addr last,libc,none

Key settings:

  • hold valid 10s — Re-resolve DNS every 10 seconds
  • resolvers dns on the server line — Use the resolver for this backend
  • init-addr last,libc,none — Fallback order for initial address resolution

Validate and reload:

haproxy -c -f /etc/haproxy/haproxy.cfg && systemctl reload haproxy

Lessons Learned

  1. HAProxy caches DNS by default — Fine for static IPs, dangerous for cloud load balancers
  2. Silent failures are the worst — The application saw connection timeouts but didn’t surface them clearly
  3. Always configure DNS resolution for backends with dynamic IPs (AWS ELB, cloud endpoints, etc.)

Quick Diagnostic Commands

# Check if HAProxy backend is healthy
echo "show stat" | socat stdio /run/haproxy/admin.sock

# Test direct connectivity to backend
nc -zv your-endpoint.example.com 3306

# Check current DNS resolution
dig +short your-endpoint.example.com

# View HAProxy logs
journalctl -u haproxy --since "1 hour ago"

A simple reload fixed the immediate issue. A config change prevented it from happening again.