Content Error or Suggest an Edit
Notice a grammatical error or technical inaccuracy? Let us know; we will give you credit!
Introduction
Random SSL errors in Cloudflare — 525s, 520s, sites going down for a minute or two then coming back on their own — aren’t always a Cloudflare problem. Sometimes the issue is deeper, on the origin server itself.
That’s what we ran into on a RunCloud-managed server after a routine nginx-rc package upgrade. Here’s what caused it and how we fixed it.
The Symptoms, Cloudflare 525 and 520 Errors
Uptime monitoring started firing on HTTP 525 and 520 errors — Cloudflare’s codes for SSL handshake failures and unknown origin errors. Outages were hitting every 30–60 minutes, each lasting 1–3 minutes before auto-recovery.
The nginx-rc error log made the pattern clearer:
free(): invalid next size (normal) [alert] worker process 498214 exited on signal 6 (core dumped) free(): invalid next size (normal) [alert] worker process 498289 exited on signal 6 (core dumped)
Signal 6 is SIGABRT — glibc detecting heap corruption and calling abort(). The process wasn’t killed from the outside. It killed itself after detecting corrupted memory. Workers were respawning and crashing again almost immediately, producing the intermittent pattern visible at the Cloudflare layer.
Initial Suspects, ModSecurity
The error log showed RunCloud’s nginx-rc loading ModSecurity on startup:
ModSecurity-nginx v1.0.4 (rules loaded inline/local/remote: 0/0/0) libmodsecurity3 version 3.0.12
ModSecurity 3.0.12 has documented heap corruption issues under concurrent load. We compiled libmodsecurity 3.0.14 from source, deployed it, and confirmed via /proc memory maps that the new library was loaded. The crashes continued.
Kernel logs added some noise: EXT4 directory index warnings on a high-inode directory, and core_pipe_limit messages showing core dumps were being skipped due to too many simultaneous crashes. Neither turned out to be relevant. ModSecurity looked convincing. The stack trace ruled it out completely.
Capturing Nginx Core Dumps
Workers were crashing fast enough to saturate the kernel’s concurrent core dump limit. The default systemd configuration also caps core dump size at zero, silently discarding dumps. To fix thise we needed to do the following.
# Raise the kernel pipe limit echo 64 > /proc/sys/kernel/core_pipe_limit sysctl -w kernel.core_pattern='/tmp/nginx-core-%e-%p-%t' # Override systemd's core size cap mkdir -p /etc/systemd/system/nginx-rc.service.d/ cat > /etc/systemd/system/nginx-rc.service.d/coredump.conf << 'EOF' [Service] LimitCORE=infinity EOF systemctl daemon-reload && systemctl restart nginx-rc
With dumps now capturing, coredumpctl gdb nginx-rc produced the stack trace that identified the real cause:
#8 tls_early_post_process_client_hello (nginx-rc) #9 read_state_machine (nginx-rc) #10 ngx_ssl_handshake (nginx-rc) #11 ngx_http_ssl_handshake (nginx-rc) #12 ngx_epoll_process_events (nginx-rc) #13 ngx_process_events_and_timers (nginx-rc) #14 ngx_worker_process_cycle (nginx-rc)
The crash was occurring inside tls_early_post_process_client_hello — OpenSSL’s TLS Client Hello extension processing code — not in ModSecurity, PHP, or any application layer component.
Identifying the Root Cause, OpenSSL
With OpenSSL implicated, we checked the nginx-rc build flags:
--with-openssl=/root/Downloads/openssl-3.3.0
RunCloud’s nginx-rc binary was statically compiled against OpenSSL 3.3.0, which contains a heap corruption bug in Client Hello extension processing. The bug manifests specifically during nginx graceful reloads — when new worker processes initialize SSL contexts while old workers are still serving connections, the OpenSSL code corrupts the heap, glibc detects it, and the worker aborts.
A review of the dpkg logs confirmed the timing:
2026-04-29 06:56:55 upgrade nginx-rc 1.27.1-1+ubuntu24.04+9 → 1.29.2-1+ubuntu24.04+2
RunCloud’s agent had upgraded nginx-rc from 1.27.1 to 1.29.2 at 06:56 UTC. The first worker crash occurred 35 seconds later. nginx-rc 1.29.2 introduced the OpenSSL 3.3.0 build. Version 1.27.1 was compiled against an earlier, unaffected OpenSSL release.
Resolution
Downgrading to the previously stable nginx-rc version stopped the crashes immediately:
apt-get install --allow-downgrades nginx-rc=1.27.1-1+ubuntu24.04+9
The package was then pinned to prevent re-upgrade:
apt-mark hold nginx-rc cat > /etc/apt/preferences.d/nginx-rc << 'EOF' Package: nginx-rc Pin: version 1.27.1-1+ubuntu24.04+9 Pin-Priority: 1001 EOF
Key Takeaways for RunCloud Administrators
1. Get a core dump before chasing suspects. ModSecurity, PHP-FPM, and OPcache are common culprits for nginx worker crashes. In this case the stack trace ruled all of them out in seconds. Without the dump we would have continued investigating the wrong layer.
2. Enable core dumps explicitly on RunCloud servers. systemd silently discards core dumps by default. The LimitCORE=infinity service override is essential for any meaningful crash analysis on RunCloud-managed infrastructure.
3. nginx-rc 1.29.2 is affected. If you are running RunCloud and upgraded to nginx-rc 1.29.2-1+ubuntu24.04+2, downgrade to 1.27.1-1+ubuntu24.04+9 and pin the package until RunCloud releases a build compiled against a patched OpenSSL version.
4. Watch for the reload-specific pattern. This bug only triggers on graceful reload, not cold restart. If your server crashes on systemctl reload nginx-rc but not systemctl restart nginx-rc, OpenSSL shared memory or SSL context handling during worker handoff is a strong candidate.
5. Pin critical packages on production RunCloud servers. RunCloud’s agent will upgrade nginx-rc automatically. Use both apt-mark hold and an apt preferences pin for belt-and-suspenders protection on production systems.
Reporting to RunCloud
We have reported this issue to RunCloud with the full stack trace and build flag analysis. RunCloud should be recompiling nginx-rc against OpenSSL 3.3.2 or later — which contains the upstream fix for this class of heap corruption in TLS handshake processing. If you are affected, open a support ticket with RunCloud referencing nginx-rc 1.29.2, OpenSSL 3.3.0, and tls_early_post_process_client_hello.
