Live Blog

Caching Facebook Query Parameters/Strings with Nginx

Introduction

When someone shares a link on Facebook, the link they click includes an extra query string, like the following.

https://managingwp.io/live-blog/nginx-error-setrlimitrlimit_nofile-25562325-failed-1-operation-not-permitted/?fbclid=IwY2xjawE6Cz1leHRuA2FlbQIxMQABHQtQRq0Ma2pk9bXQQfM_vyGWE_YPHCMp9VhWudfQR9N6FRv7pkKjL7ZTfQ_aem_GTXCB2GK_bezDLTO7WhktA

This is problematic because common Nginx configurations will not cache URLs with a query string, and it’s even worse when a site is using WordPress, as these queries will result in a dynamic request.

Configuration for Nginx FastCGI Caching on GridPane

Unfortunately, this guide is specifically for GridPane, but you can read more about doing this on Easy Engine.

https://community.easyengine.io/t/ignore-query-parameters-with-fastcgi-cache/4622

The variable $uri only resolves to index.php, and $request_uri includes the path and query strings. We will address this problem by creating a new variable $path_uri using the map function. This will provide us with the URI path without the query strings, which is important as we don’t want to create a cache entry for every query string used. This would eat up the cache pretty quickly on a busy site.

We will be replacing $request_uri for fastcgi_cache_key with $cache_uri, so that we can dynamically set $cache_uri to either $request_uri or our new $path_uri and not break anything else that might depend on $request_uri

Finally when we encounter $args that equals utm_, gclid or fbclid we set $skip_cache to 0 and set a skip-reason.

Lastly, if we do see those specific query strings, we will make sure that they’re cached and use the $p

Step 1 – Setup the Map

Create a new file at /etc/nginx/conf.d/fbutm.conf with the following code, this will get us the URI Path into the $path_uri variable using $request_uri

# fbutm: Create $path_uri using map
map $request_uri $path_uri {
  "~^(?P<path>[^?]*)(\?.*)?$"  $path;
}

Step 2 – Edit /etc/nginx/common/(domain.com)-fcgi-cache-var.conf

Edit the file /etc/nignx/common/(domain.com)-fcgi-cache-var.conf file, the (domain.com) part is the domain name you want to affect. This has to occur on each site and can’t be completed server-wide.

  • If a request comes in with fbclid in the args, we’ll set the skip reason. This is mostly for debugging; you don’t need to add it.
  • We’re going to set $cache_uri to $path_uri
  • Update fastcgi_cache_key replace $request_uri with $cache_uri
# #
# GridPane vHost FastCGI caching per site include
# Version 1.2.0
# #


# fbutm: If fbclid is in url, set $cache_uri to $path_uri and set $skip_reason, also update fastcgi_cache_key
if ($args ~* "(utm_|gclid|fbclid|utm|gad_source)" ) {
    set $cache_uri $path_uri;
    # set $skip_reason "${skip_reason}-fbutm-${cache_uri}-${uri}"; # Used for debugging
    set $skip_reason "${skip_reason}-fbutm"
}

#fastcgi_cache_key "$scheme$request_method$geoip2_data_country_code$host$request_uri";
fastcgi_cache_key "$scheme$request_method$host$cache_uri";
fastcgi_cache_valid 200 300s;

Step 3 – Create /etc/nginx/extra.d/fbutm-main-context.conf

Create /etc/nginx/extra.d/fbutm-main-context.conf and define default data for $cache_uri as $request_uri

# fbutm: We need to default $cache_uri as it's the new $request_uri
set $cache_uri $request_uri;

Step 4 – Create /etc/nginx extra.d/fbutm-skip-fcgi-cache-context.conf

If we see fbclid= in $args we don’t skip caching the request.

# fbutm: If we see fbclid in $args, don't skip cache and set skip reason
if ($args ~* "(utm_|gclid|fbclid|utm|gad_source)" ) {
    set $skip_cache 0;
    #set $skip_reason "${skip_reason}-fbutm-${cache_uri}"; # For debbuging
    set $skip_reason "${skip_reason}-fbutm";
}

# if you have queries that shouldn't be cached
if ($args ~* "(listing_type|product_color)" {
    set $skip_cache 1;
    set $skip_reason "-querystring-fbutm-bypass";
}

Questions

Here are some questions from the person looking for this solution.

Thanks for creating an article, but I don’t understand.

No worries. The article is more advanced than what the average person would understand; most of my articles are like this.

I’ll follow the steps, but I’d like to know what’s happening and if it’s standard on GP servers.

I’ve updated the page and will continue to update it, there is also a changelog section at the bottom of any changes I make.

The implementation uses methods that GridPane has provided to modify the Nginx config. What I’ve created, however, is to be used at your own risk since it’s not built into GridPane. At the time of this writing, it works and has been tested, and I’ll update it as needed.

Can you include “gclid”, “gad_source” and “utm_*”?

It’s been added. You can add more and share what you’ve added. Here is a full list from WP Rocket

Caching query strings – WP Rocket Knowledge Base
WP Rocket does not cache pages with query strings by default, although there are some very specific exceptions (see below).  A query string is where you have a
docs.wp-rocket.me

Age Verify pluginage-verified

Autoptimizeao_noptimize

AMPusqp

Cookie Noticecn-reloaded

ShareASalesscid

ActiveCampaign

  • vgo_ee

Adobe Advertising Cloud

  • ef_id

Adobe Analytics

  • s_kwcid

Bronto

  • _bta_tid
  • _bta_c

Dotdigital

  • dm_i

Facebook:

  • fb_action_ids
  • fb_action_types
  • fb_source
  • fbclid

Google Analytics and Ads:

  • utm_source
  • utm_campaign
  • utm_medium
  • utm_expid
  • utm_term
  • utm_content
  • utm_id
  • utm_source_platform
  • utm_creative_format
  • utm_marketing_tactic
  • _ga
  • gclid
  • campaignid
  • adgroupid
  • adid
  • gbraid
  • wbraid
  • gad_source

Google Web Stories:

  • _gl

Google DoubleClick

  • gclsrc

GoDataFeed

  • gdfms
  • gdftrk
  • gdffi

Klaviyo

  • _ke
  • _kx

Listrak:

  • trk_contact
  • trk_msg
  • trk_module
  • trk_sid

Mailchimp:

  • mc_cid
  • mc_eid

Marin

  • mkwid
  • pcrid

Matomo:

  • mtm_source
  • mtm_medium
  • mtm_campaign
  • mtm_keyword
  • mtm_cid
  • mtm_content

Microsoft Advertising

  • msclkid

Pinterest

  • epik
  • pp

Piwik Pro:

  • pk_source
  • pk_medium
  • pk_campaign
  • pk_keyword
  • pk_cid
  • pk_content

Springbot

sb_referer_host

redirect_log_mongo_id

redirect_mongo_id

What if the query string has multiple arguments, such as utm and listing_type?

The challenge is handling URLs with more than one query string, for instance, if both fbclid and listing_type parameters are set like the following example.

https://domain.com/?fbclid=qwe123123&list_type=black

In that scenario, the cache should be bypassed but in-fact it’s not, it will be cached because it contains one of the query strings we mentioned above (utm_|gclid|fbclid|utm|gad_source). The page shouldn’t be serve from cache.

The problem is that Nginx, by default, doesn’t have a great engine to deal with these cases, which is good and bad. You would have to deal with these instances on a case-by-case basis, and ultimately have another rule below the above rule to turn caching off for the query string "listing_type"

  1. You could do another if after the check for fbutm, and if it matches listing_type, skip the cache.
  2. Look at using LUA, and striping the query ags and values for fbutm and if any more arguments are left then don’t cache. But this is problematic if an argument such as search=best+dog+toys is used, you’d be caching every search as a full page. This is inefficient since not many people will search the same. Better to get a search plugin with support for object cache.
  3. If listing_type is only 10 possible values, then you could cache these, but if they’re pagnated, then you’ll most likely have ?listing_type=black&page=1 which is fine, but not ideal.
  4. Convert the listing_type into slugs, so they’re instead /listing-type/black this might not be possible, and list-type could be user defined versus a set list.
  5. Find a better plugin that is uber efficient at generating page data, which means it doesn’t use the WordPress internal tables, or the WordPress REST API or even WordPress. Unfortunately, WordPress loads so much crap you’d have to go with a headless solution to get good performance.

The best way to approach all of this is fragmented cache, or using the WordPress REST API and ajax.

A fragmented cache would cache the page except for the section that changes, which would be the search results. This is possible in Litespeed LSCache paid, and you would have to code the page to support it.

The other option is using WordPress REST API and ajax. So the page will update data automatically without reloading. The problem with this is that you can’t share the page on social, as the link would always be /listing-search

This is also a problem for WP Rocket as per Caching query strings – WP Rocket Knowledge Base

Conclusion

I’ve tested this, and it works. However, I might have missed some edge cases. So, it’s up to you to test this out fully on your site.

Changelog

  • 08-27-2024 – Added questions section, updated code to remove debug information, added in code to skip cache if specific queries are included.
  • 08-27-2024 – Fixed some types, better explanation, and added in “utm_|gclid|fbclid|utm|gad_source”
0 Shares: