Diagnosing nginx Worker Crashes in RunCloud: Tracing a Heap Corruption Bug to OpenSSL 3.3.0

Introduction
The Symptoms, Cloudflare 525 and 520 Errors
Initial Suspects, ModSecurity
Capturing Nginx Core Dumps
Identifying the Root Cause
Resolution
Key Takeaways for RunCloud Administrators
Reporting to RunCloud
Email Chain with Runcloud
Changelog

Content Error or Suggest an Edit

Notice a grammatical error or technical inaccuracy? Let us know; we will give you credit!

Introduction

Random SSL errors in Cloudflare — 525s, 520s, sites going down for a minute or two then coming back on their own — aren’t always a Cloudflare problem. Sometimes the issue is deeper, on the origin server itself.

That’s what we ran into on a RunCloud-managed server after a routine nginx-rc package upgrade. Here’s what caused it and how we fixed it.

The Symptoms, Cloudflare 525 and 520 Errors

Uptime monitoring started firing on HTTP 525 and 520 errors — Cloudflare’s codes for SSL handshake failures and unknown origin errors. Outages were hitting every 30–60 minutes, each lasting 1–3 minutes before auto-recovery.

The nginx-rc error log made the pattern clearer:

free(): invalid next size (normal)
[alert] worker process 498214 exited on signal 6 (core dumped)
free(): invalid next size (normal)
[alert] worker process 498289 exited on signal 6 (core dumped)

Signal 6 is SIGABRT — glibc detecting heap corruption and calling abort(). The process wasn’t killed from the outside. It killed itself after detecting corrupted memory. Workers were respawning and crashing again almost immediately, producing the intermittent pattern visible at the Cloudflare layer.

Initial Suspects, ModSecurity

The error log showed RunCloud’s nginx-rc loading ModSecurity on startup:

ModSecurity-nginx v1.0.4 (rules loaded inline/local/remote: 0/0/0)
libmodsecurity3 version 3.0.12

ModSecurity 3.0.12 has documented heap corruption issues under concurrent load. We compiled libmodsecurity 3.0.14 from source, deployed it, and confirmed via /proc memory maps that the new library was loaded. The crashes continued.

Kernel logs added some noise: EXT4 directory index warnings on a high-inode directory, and core_pipe_limit messages showing core dumps were being skipped due to too many simultaneous crashes. Neither turned out to be relevant. ModSecurity looked convincing. The stack trace ruled it out completely.

Capturing Nginx Core Dumps

Workers were crashing fast enough to saturate the kernel’s concurrent core dump limit. The default systemd configuration also caps core dump size at zero, silently discarding dumps. To fix thise we needed to do the following.

# Raise the kernel pipe limit
echo 64 > /proc/sys/kernel/core_pipe_limit
sysctl -w kernel.core_pattern='/tmp/nginx-core-%e-%p-%t'

# Override systemd's core size cap
mkdir -p /etc/systemd/system/nginx-rc.service.d/
cat > /etc/systemd/system/nginx-rc.service.d/coredump.conf << 'EOF'
[Service]
LimitCORE=infinity
EOF
systemctl daemon-reload && systemctl restart nginx-rc

With dumps now capturing, coredumpctl gdb nginx-rc produced the stack trace that identified the real cause:

#8  tls_early_post_process_client_hello  (nginx-rc)
#9  read_state_machine                   (nginx-rc)
#10 ngx_ssl_handshake                    (nginx-rc)
#11 ngx_http_ssl_handshake               (nginx-rc)
#12 ngx_epoll_process_events             (nginx-rc)
#13 ngx_process_events_and_timers        (nginx-rc)
#14 ngx_worker_process_cycle             (nginx-rc)

The crash was occurring inside tls_early_post_process_client_hello — OpenSSL’s TLS Client Hello extension processing code — not in ModSecurity, PHP, or any application layer component.

Identifying the Root Cause

With the SSL handshake code implicated by the stack trace, we checked the nginx-rc build flags:

--with-openssl=/root/Downloads/openssl-3.3.0

RunCloud’s nginx-rc 1.29.2 is a custom binary compiled against OpenSSL 3.3.0 with a number of third-party modules. The crash occurs inside OpenSSL’s TLS handshake code during graceful reloads — specifically in tls_early_post_process_client_hello — where heap corruption is detected and glibc aborts the worker process.

A review of the dpkg logs confirmed the timing:

2026-04-29 06:56:55 upgrade nginx-rc 1.27.1-1+ubuntu24.04+9 → 1.29.2-1+ubuntu24.04+2

RunCloud’s agent upgraded nginx-rc from 1.27.1 to 1.29.2 at 06:56 UTC. The first worker crash occurred 35 seconds later. Downgrading back to 1.27.1 stopped the crashes immediately, confirming the defect is specific to RunCloud’s 1.29.2 build.

Two upstream fixes are relevant and should be referenced in any communication with RunCloud:

nginx 1.29.2 official changelog — https://nginx.org/en/CHANGES

“Bugfix: in SSL certificate caching during reconfiguration”

“Bugfix: SSL handshake always failed when using TLSv1.3 with OpenSSL and client certificates and resuming a session with a different SNI value; the bug had appeared in 1.27.4”

SSL certificate caching during reconfiguration was introduced as a feature in nginx 1.27.2. The above fixes suggest RunCloud’s 1.29.2 build either does not include these patches or was compiled against an OpenSSL version that exposes the same code path.

OpenSSL 3.3 release notes — https://openssl-library.org/news/openssl-3.3-notes/

OpenSSL 3.3.0 was a feature release with no security patches. Bug fixes were not introduced until 3.3.1, which addressed:

“Fixed potential use after free after SSL_free_buffers() is called (CVE-2024-4741)”

“Fixed an issue where checking excessively long DSA keys or parameters may be very slow (CVE-2024-4603)”

RunCloud’s binary is statically compiled against 3.3.0 specifically — the initial feature release — meaning none of the subsequent 3.3.x bug fixes are included.

Whether the root cause is the nginx SSL certificate caching bug, the OpenSSL 3.3.0 static compile, or an incompatible combination of the two requires RunCloud to investigate their build process. Both references above should be provided to RunCloud when filing the support ticket.

Resolution

Downgrading to the previously stable nginx-rc version stopped the crashes immediately:

apt-get install --allow-downgrades nginx-rc=1.27.1-1+ubuntu24.04+9

The package was then pinned to prevent re-upgrade:

apt-mark hold nginx-rc

cat > /etc/apt/preferences.d/nginx-rc << 'EOF'
Package: nginx-rc
Pin: version 1.27.1-1+ubuntu24.04+9
Pin-Priority: 1001
EOF

Key Takeaways for RunCloud Administrators

1. Get a core dump before chasing suspects. ModSecurity, PHP-FPM, and OPcache are common culprits for nginx worker crashes. In this case the stack trace ruled all of them out in seconds. Without the dump we would have continued investigating the wrong layer.

2. Enable core dumps explicitly on RunCloud servers. systemd silently discards core dumps by default. The LimitCORE=infinity service override is essential for any meaningful crash analysis on RunCloud-managed infrastructure.

3. nginx-rc 1.29.2 is affected. If you are running RunCloud and upgraded to nginx-rc 1.29.2-1+ubuntu24.04+2, downgrade to 1.27.1-1+ubuntu24.04+9 and pin the package until RunCloud releases a newer Nginx build.

4. Watch for the reload-specific pattern. This bug only triggers on graceful reload, not cold restart. If your server crashes on systemctl reload nginx-rc but not systemctl restart nginx-rc, OpenSSL shared memory or SSL context handling during worker handoff is a strong candidate.

5. Pin critical packages on production RunCloud servers. RunCloud’s agent will upgrade nginx-rc automatically. Use both apt-mark hold and an apt preferences pin for belt-and-suspenders protection on production systems.

Reporting to RunCloud

We have reported this issue to RunCloud with the full stack trace and build flag analysis. RunCloud should be investigating their nginx-rc 1.29.2 build process — specifically whether the upstream nginx SSL certificate caching fix is included in their binary, and whether recompiling against OpenSSL 3.3.1 or later resolves the heap corruption in the TLS handshake code path.

If you are affected, open a support ticket with RunCloud referencing:

OpenSSL reference: https://openssl-library.org/news/openssl-3.3-notes/ — OpenSSL 3.3.0 is an unpatched feature release; fixes begin at 3.3.1
Package: nginx-rc 1.29.2-1+ubuntu24.04+2
Compiled against: OpenSSL 3.3.0 (statically linked)
Crash function: tls_early_post_process_client_hello
Trigger: nginx graceful reload (systemctl reload nginx-rc)
Workaround: Downgrade to nginx-rc 1.27.1-1+ubuntu24.04+9 and pin the package
nginx reference: https://nginx.org/en/CHANGES — “Bugfix: in SSL certificate caching during reconfiguration”

Email Chain with Runcloud

Email to Runcloud on 04/30/2026

I sent an original email that was pretty specific about the issue, but got a generic response to get back

Hi Jordan,

Thank you for contacting us,

To assist you further on this, please share below details:

1. Agent Access

With AgentAccess we can access your account and check this issue further.  
AgentAccess: The button to insert AgentAccess is on the top right corner of a support ticket. Check: https://manage.runcloud.io/support/tickets >> Select your ticket >> Click 'Allow Access' button. 

2. We require your confirmation to access the server via SSH using the root account. With your permission, we will add our SSH key and proceed with the login. Please confirm by replying “Yes”.

Awaiting your response.

It went to spam. So I checked in on May 12th 2026 and resent my response.

Runcloud Email #2 05/12/2026

---

ENVIRONMENT

- Server: Ubuntu 24.04
- Package: nginx-rc 1.29.2-1+ubuntu24.04+2
- Installed by: RunCloud agent auto-upgrade on 2026-04-29 at 06:56 UTC
- OpenSSL: 3.3.0 (statically compiled into the binary)

---

SYMPTOMS

Every graceful reload (systemctl reload nginx-rc) causes all worker processes to crash with the following error:

    free(): invalid next size (normal)
    [alert] worker process XXXXXX exited on signal 6 (core dumped)

This results in intermittent 525 and 520 errors at the Cloudflare layer, lasting 1-3 minutes per incident until workers respawn. The issue does not occur on a full stop/start — only on graceful reload.

---

STACK TRACE (from coredumpctl gdb nginx-rc)

    Signal: 6 (ABRT)
    #8  tls_early_post_process_client_hello (nginx-rc + 0x3aeebb)
    #9  read_state_machine (nginx-rc + 0x39afeb)
    #10 ngx_ssl_handshake (nginx-rc + 0x172ba2)
    #11 ngx_http_ssl_handshake (nginx-rc + 0x1abf63)
    #12 ngx_epoll_process_events (nginx-rc + 0x16cce6)
    #13 ngx_process_events_and_timers (nginx-rc + 0x16155a)
    #14 ngx_worker_process_cycle (nginx-rc + 0x16e770)

The crash is occurring inside OpenSSL's TLS Client Hello extension processing code, triggered during worker initialization on graceful reload.

---

ROOT CAUSE ANALYSIS

The nginx-rc 1.29.2 binary is compiled against OpenSSL 3.3.0 (--with-openssl=/root/Downloads/openssl-3.3.0). OpenSSL 3.3.0 was an unpatched feature release — bug fixes were not introduced until 3.3.1. Reference: https://openssl-library.org/news/openssl-3.3-notes/

Additionally, SSL certificate caching during reconfiguration was introduced as a feature in nginx 1.27.2. The upstream nginx project acknowledged and fixed a bug in this feature in the official nginx 1.29.2 changelog:

    "Bugfix: in SSL certificate caching during reconfiguration"
    "Bugfix: SSL handshake always failed when using TLSv1.3 with OpenSSL and client certificates and resuming a session with a different SNI value"

Reference: https://nginx.org/en/CHANGES

It appears RunCloud's nginx-rc 1.29.2 build either does not include these patches, is compiled against an OpenSSL version that exposes the same vulnerable code path, or both.

---

WORKAROUND

Downgrading to nginx-rc 1.27.1-1+ubuntu24.04+9 stops the crashes immediately. We have pinned this version on the affected server to prevent re-upgrade:

    apt-get install --allow-downgrades nginx-rc=1.27.1-1+ubuntu24.04+9
    apt-mark hold nginx-rc

---

Runcloud Response on 05/13/2026

We would like to inform you that an updated fix has been already released to production. The latest nginx-rc build has been recompiled using a newer OpenSSL version that includes the upstream fix for the TLS handshake-related heap corruption issue.

Changelog

05/19/2026 – Updated to include Runcloud’s response.
05/12/2026 – Update blog to zero on on Runcloud Nginx build being the issue most importantly, and add the Runcloud email.

Diagnosing nginx Worker Crashes in RunCloud: Tracing a Heap Corruption Bug to OpenSSL 3.3.0

Table of Contents

Introduction

The Symptoms, Cloudflare 525 and 520 Errors

Initial Suspects, ModSecurity

Capturing Nginx Core Dumps

Identifying the Root Cause

Resolution

Key Takeaways for RunCloud Administrators

Reporting to RunCloud

Email Chain with Runcloud

Email to Runcloud on 04/30/2026

Runcloud Email #2 05/12/2026

Runcloud Response on 05/13/2026

Changelog

Tags:

Diagnosing nginx Worker Crashes in RunCloud: Tracing a Heap Corruption Bug to OpenSSL 3.3.0

Table of Contents

Introduction

The Symptoms, Cloudflare 525 and 520 Errors

Initial Suspects, ModSecurity

Capturing Nginx Core Dumps

Identifying the Root Cause

Resolution

Key Takeaways for RunCloud Administrators

Reporting to RunCloud

Email Chain with Runcloud

Email to Runcloud on 04/30/2026

Runcloud Email #2 05/12/2026

Runcloud Response on 05/13/2026

Changelog

Tags:

Join our Newsletter