Introduction
When someone shares a link on Facebook, the link they click includes an extra query string, like the following.
https://managingwp.io/live-blog/nginx-error-setrlimitrlimit_nofile-25562325-failed-1-operation-not-permitted/?fbclid=IwY2xjawE6Cz1leHRuA2FlbQIxMQABHQtQRq0Ma2pk9bXQQfM_vyGWE_YPHCMp9VhWudfQR9N6FRv7pkKjL7ZTfQ_aem_GTXCB2GK_bezDLTO7WhktA
This is problematic because common Nginx configurations will not cache URLs with a query string, and it’s even worse when a site is using WordPress, as these queries will result in a dynamic request.
Configuration for Nginx FastCGI Caching on GridPane
Unfortunately, this guide is specifically for GridPane, but you can read more about doing this on Easy Engine.
https://community.easyengine.io/t/ignore-query-parameters-with-fastcgi-cache/4622
The variable $uri
only resolves to index.php
, and $request_uri
includes the path and query strings. We will address this problem by creating a new variable $path_uri
using the map function. This will provide us with the URI path without the query strings, which is important as we don’t want to create a cache entry for every query string used. This would eat up the cache pretty quickly on a busy site.
We will be replacing $request_uri
for fastcgi_cache_key with $cache_uri
, so that we can dynamically set $cache_uri
to either $request_uri
or our new $path_uri
and not break anything else that might depend on $request_uri
Finally when we encounter $args
that equals utm_, gclid or fbclid we set $skip_cache
to 0 and set a skip-reason.
Lastly, if we do see those specific query strings, we will make sure that they’re cached and use the $p
Step 1 – Setup the Map
Create a new file at /etc/nginx/conf.d/fbutm.conf
with the following code, this will get us the URI Path into the $path_uri
variable using $request_uri
# fbutm: Create $path_uri using map map $request_uri $path_uri { "~^(?P<path>[^?]*)(\?.*)?$" $path; }
Step 2 – Edit /etc/nginx/common/(domain.com)-fcgi-cache-var.conf
Edit the file /etc/nignx/common/(domain.com)-fcgi-cache-var.conf
file, the (domain.com) part is the domain name you want to affect. This has to occur on each site and can’t be completed server-wide.
- If a request comes in with fbclid in the args, we’ll set the skip reason. This is mostly for debugging; you don’t need to add it.
- We’re going to set
$cache_uri
to$path_uri
- Update
fastcgi_cache_key
replace$request_uri
with$cache_uri
# # # GridPane vHost FastCGI caching per site include # Version 1.2.0 # # # fbutm: If fbclid is in url, set $cache_uri to $path_uri and set $skip_reason, also update fastcgi_cache_key if ($args ~* "(utm_|gclid|fbclid|utm|gad_source)" ) { set $cache_uri $path_uri; # set $skip_reason "${skip_reason}-fbutm-${cache_uri}-${uri}"; # Used for debugging set $skip_reason "${skip_reason}-fbutm" } #fastcgi_cache_key "$scheme$request_method$geoip2_data_country_code$host$request_uri"; fastcgi_cache_key "$scheme$request_method$host$cache_uri"; fastcgi_cache_valid 200 300s;
Step 3 – Create /etc/nginx/extra.d/fbutm-main-context.conf
Create /etc/nginx/extra.d/fbutm-main-context.conf
and define default data for $cache_uri as $request_uri
# fbutm: We need to default $cache_uri as it's the new $request_uri set $cache_uri $request_uri;
Step 4 – Create /etc/nginx extra.d/fbutm-skip-fcgi-cache-context.conf
If we see fbclid=
in $args
we don’t skip caching the request.
# fbutm: If we see fbclid in $args, don't skip cache and set skip reason if ($args ~* "(utm_|gclid|fbclid|utm|gad_source)" ) { set $skip_cache 0; #set $skip_reason "${skip_reason}-fbutm-${cache_uri}"; # For debbuging set $skip_reason "${skip_reason}-fbutm"; } # if you have queries that shouldn't be cached if ($args ~* "(listing_type|product_color)" { set $skip_cache 1; set $skip_reason "-querystring-fbutm-bypass"; }
Questions
Here are some questions from the person looking for this solution.
Thanks for creating an article, but I don’t understand.
No worries. The article is more advanced than what the average person would understand; most of my articles are like this.
I’ll follow the steps, but I’d like to know what’s happening and if it’s standard on GP servers.
I’ve updated the page and will continue to update it, there is also a changelog section at the bottom of any changes I make.
The implementation uses methods that GridPane has provided to modify the Nginx config. What I’ve created, however, is to be used at your own risk since it’s not built into GridPane. At the time of this writing, it works and has been tested, and I’ll update it as needed.
Can you include “gclid”, “gad_source” and “utm_*”?
It’s been added. You can add more and share what you’ve added. Here is a full list from WP Rocket
Age Verify plugin: age-verified
Autoptimize: ao_noptimize
AMP: usqp
Cookie Notice: cn-reloaded
ShareASale: sscid
ActiveCampaign
vgo_ee
Adobe Advertising Cloud
ef_id
Adobe Analytics
s_kwcid
Bronto
_bta_tid
_bta_c
Dotdigital
dm_i
Facebook:
fb_action_ids
fb_action_types
fb_source
fbclid
Google Analytics and Ads:
utm_source
utm_campaign
utm_medium
utm_expid
utm_term
utm_content
utm_id
utm_source_platform
utm_creative_format
utm_marketing_tactic
_ga
gclid
campaignid
adgroupid
adid
gbraid
wbraid
gad_source
Google Web Stories:
_gl
Google DoubleClick
gclsrc
GoDataFeed
gdfms
gdftrk
gdffi
Klaviyo
_ke
_kx
Listrak:
trk_contact
trk_msg
trk_module
trk_sid
Mailchimp:
mc_cid
mc_eid
Marin
mkwid
pcrid
Matomo:
mtm_source
mtm_medium
mtm_campaign
mtm_keyword
mtm_cid
mtm_content
Microsoft Advertising
msclkid
epik
pp
Piwik Pro:
pk_source
pk_medium
pk_campaign
pk_keyword
pk_cid
pk_content
Springbot
sb_referer_host
redirect_log_mongo_id
redirect_mongo_id
What if the query string has multiple arguments, such as utm and listing_type?
The challenge is handling URLs with more than one query string, for instance, if both fbclid
and listing_type
parameters are set like the following example.
https://domain.com/?fbclid=qwe123123&list_type=black
In that scenario, the cache should be bypassed but in-fact it’s not, it will be cached because it contains one of the query strings we mentioned above (utm_|gclid|fbclid|utm|gad_source
). The page shouldn’t be serve from cache.
The problem is that Nginx, by default, doesn’t have a great engine to deal with these cases, which is good and bad. You would have to deal with these instances on a case-by-case basis, and ultimately have another rule below the above rule to turn caching off for the query string "listing_type"
- You could do another if after the check for fbutm, and if it matches listing_type, skip the cache.
- Look at using LUA, and striping the query ags and values for fbutm and if any more arguments are left then don’t cache. But this is problematic if an argument such as search=best+dog+toys is used, you’d be caching every search as a full page. This is inefficient since not many people will search the same. Better to get a search plugin with support for object cache.
- If listing_type is only 10 possible values, then you could cache these, but if they’re pagnated, then you’ll most likely have ?listing_type=black&page=1 which is fine, but not ideal.
- Convert the listing_type into slugs, so they’re instead /listing-type/black this might not be possible, and list-type could be user defined versus a set list.
- Find a better plugin that is uber efficient at generating page data, which means it doesn’t use the WordPress internal tables, or the WordPress REST API or even WordPress. Unfortunately, WordPress loads so much crap you’d have to go with a headless solution to get good performance.
The best way to approach all of this is fragmented cache, or using the WordPress REST API and ajax.
A fragmented cache would cache the page except for the section that changes, which would be the search results. This is possible in Litespeed LSCache paid, and you would have to code the page to support it.
The other option is using WordPress REST API and ajax. So the page will update data automatically without reloading. The problem with this is that you can’t share the page on social, as the link would always be /listing-search
This is also a problem for WP Rocket as per Caching query strings – WP Rocket Knowledge Base
Conclusion
I’ve tested this, and it works. However, I might have missed some edge cases. So, it’s up to you to test this out fully on your site.
Changelog
- 08-27-2024 – Added questions section, updated code to remove debug information, added in code to skip cache if specific queries are included.
- 08-27-2024 – Fixed some types, better explanation, and added in “utm_|gclid|fbclid|utm|gad_source”