Introduction
Bots Blocking List
1. Facebook Link Previews
Blocking Scraping Bots accessing Events Calendar Pages
1. Block with Cloudflare
2. Nginx Option 1 – Blocking Scraping Bots with Nginx
  1. Step 1 – Identify the Bots via Maps in http section
  2. Step 2 – Apply blocking to your site
Nginx Option 2 – Rate Limiting Scraping Bots
1. 1. Step 1 – Identify the Bots via Maps in http section
2. Blocking Scraping Bots with Litespeed/Openlitespeed
Confirming Blocks
User Agents
1. Collecting User Agents
2. Valid User Agents

Changelog

Introduction

I’ve had numerous clients contact me about their servers having memory and CPU issues. Upon further investigation, the issue seems to be bot’s scraping the Event’s Calendar pages, and following all of the links within the page.

Now, this wouldn’t be a problem for a site that has caching and regular pages, as the pages would be cached and the data returned without breaking a sweat as PHP is never involved.

Unfortunately with The Events Calendar, the links being scraped contain query strings and these aren’t natively cached. Furthermore, the bot’s are going through pagination and any links they can find on the page which might be the links to the iCal URL.

Bots Blocking List

This is the master list of common bot user-agents.

ahrefsbot
amazonbot
baiduspider
barkrowler
bingbot
blexbot
bytespider
claudebot
crawler4j
curl
dataforseo
datanyze
dotbot
duckduckbot
exabot
facebookexternalhit
googlebot
gptbot
liebaofast
linkedinbot
mb2345browser
meta-externalagent
micromessenger
mj12bot
petalbot
pinterest
qwantbot
scrapy
semrush
semrushbot
seznambot
slurp
sogou
twitterbot
wget
yandexbot
zh_cn

Facebook Link Previews

Just be aware that if you want Facebook link previews to work you must make sure that you remove the following user agents

facebookexternalhit
meta-externalagent

Blocking Scraping Bots accessing Events Calendar Pages

Block with Cloudflare

The solutions is pretty simple, we just need to block know bots from accessing the Events Calendar URL’s. Here’s a Cloudflare rule that does just this.

(http.request.uri.query contains "ical" and cf.client.bot) or (http.request.uri.query contains "eventDisplay" and cf.client.bot) or (http.request.uri.path contains "/events" and cf.client.bot)

The above will block requests containing the following

ical
eventDisplay
/events

This seems to encapsulate most of the requests that the bots are scraping.

Nginx Option 1 – Blocking Scraping Bots with Nginx

If you don’t use Cloudflare for your site, you can block the requests with Nginx. Here’s an example. You will have to adapt it.

Step 1 – Identify the Bots via Maps in http section

Add this to your http section of your Nginx Config.

GridPane: Create and add to new file /etc/nginx/conf.d/block-scraping-events.conf

# Source https://managingwp.io/live-blog/protecting-your-events-calendar-combatting-scraping-bots-and-resource-drains/
#
# Add to /etc/nginx/conf.d/block-scraping-events.conf 
# or add to /var/www/domain.com/nginx/block-scraping-events-root-context.conf
#
    variables_hash_max_size 2048;
    # Define bot user agents if needed
    map $http_user_agent $is_scraping_bot {
        default 0;
        ~*ahrefsbot 1;
        ~*amazonbot 1;
        ~*baiduspider 1;
        ~*barkrowler 1;
        ~*bingbot 1;
        ~*blexbot 1;
        ~*bytespider 1;
        ~*claudebot 1;
        ~*crawler4j 1;
        ~*curl 1;
        ~*dataforseo 1;
        ~*datanyze 1;
        ~*dotbot 1;
        ~*duckduckbot 1;
        ~*exabot 1;
        ~*facebookexternalhit 1;
        ~*googlebot 1;
        ~*gptbot 1;
        ~*liebaofast 1;
        ~*linkedinbot 1;
        ~*mb2345browser 1;
        ~*meta-externalagent 1;
        ~*micromessenger 1;
        ~*mj12bot 1;
        ~*petalbot 1;
        ~*pinterest 1;
        ~*qwantbot 1;
        ~*scrapy 1;
        ~*semrush 1;
        ~*semrushbot 1;
        ~*seznambot 1;
        ~*slurp 1;
        ~*sogou 1;
        ~*twitterbot 1;
        ~*wget 1;
        ~*yandexbot 1;
        ~*zh_cn 1;
    }

    # Map to handle request URIs
    map $request_uri $block_scraping_uri {
        default 0;
        "~*ical" 1;
        "~*eventDisplay" 1;
        "~*\/events" 1;
    }

    # Combine both conditions into one variable
    map "$is_scraping_bot$block_scraping_uri" $block_scraping_request{
        default 0;
        11 1; # Both $is_scraping_bot and $block_scraping_uri are true
    }

Step 2 – Apply blocking to your site

Add the following to your location / section.

GridPane: Create a file called /etc/nginx/extra.d/block-scraping-root-context.conf which will affect all sites. For a specific site set create a file called /var/www/domain.com/nginx/block-scraping-events-root-context.conf

# Source: https://managingwp.io/live-blog/protecting-your-events-calendar-combatting-scraping-bots-and-resource-drains/
#
# Add to /etc/nginx/extra.d/block-scraping-root-context.conf which will affect all sites. 
# For a specific site set create a file called /var/www/domain.com/nginx/block-scraping-events-root-context.conf
#
if ($block_scraping_request) {
    return 403; # Forbidden
}

Nginx Option 2 – Rate Limiting Scraping Bots

Step 1 – Identify the Bots via Maps in http section

Add this to your http section of your Nginx Config.

GridPane: Create and add to new file /etc/nginx/conf.d/block-scraping-events.conf

# Throttle scrapers hitting Events Calendar endpoints
# Context: http

# Helps with big maps
variables_hash_max_size 2048;

# --- Label bots by UA (single source of truth)
map $http_user_agent $ua_bot {
    default "";
    ~*\bahrefsbot\b                  ahrefsbot;
    ~*\bamazonbot\b                  amazonbot;
    ~*\bbaiduspider\b                baiduspider;
    ~*\bbarkrowler\b                 barkrowler;          # Babbar.tech
    ~*\bbingbot\b                    bingbot;
    ~*\bblexbot\b                    blexbot;
    ~*\bbytespider\b                 bytespider;
    ~*\bclaudebot\b                  claudebot;
    ~*\bcrawler4j\b                  crawler4j;
    ~*\bcurl\b                       curl;
    ~*\bdataforseo\b                 dataforseo;
    ~*\bdatanyze\b                   datanyze;
    ~*\bdotbot\b                     dotbot;
    ~*\bduckduckbot\b                duckduckbot;
    ~*\bexabot\b                     exabot;
    ~*\bfacebookexternalhit\b        facebookexternalhit;
    ~*\bgooglebot\b                  googlebot;
    ~*\bgptbot\b                     gptbot;
    ~*liebao(?:fast)?                liebaofast;          # LieBao/LieBaoFast
    ~*\blinkedinbot\b                linkedinbot;
    ~*\bmb2345browser\b              mb2345browser;
    ~*meta[-_]?externalagent         meta-externalagent;  # fixed (no space)
    ~*\bmicromessenger\b             micromessenger;
    ~*\bmj12bot\b                    mj12bot;
    ~*\bpetalbot\b                   petalbot;
    ~*\bpinterest(bot)?\b            pinterest;
    ~*\bqwantbot\b                   qwantbot;
    ~*\bscrapy\b                     scrapy;
    ~*\bsemrush(bot)?\b              semrushbot;
    ~*\bseznambot\b                  seznambot;
    ~*\bslurp\b                      slurp;
    ~*\bsogou\b                      sogou;
    ~*\btwitterbot\b                 twitterbot;
    ~*\bwget\b                       wget;
    ~*\byandex(bot)?\b               yandexbot;
    ~*zh[_-]?cn                      zh_cn;
}

# Boolean flag derived from label (keeps in sync automatically)
map $ua_bot $is_scraping_bot {
    default 0;
    ~.+     1;
}

# --- Match Events Calendar URIs (path or query)
map $request_uri $is_events_uri {
    default 0;

    # Query-parameter triggers (precise)
    ~*(^|[?&])post_type=tribe_events(?:&|$)   1;
    ~*(^|[?&])eventDisplay=                   1;
    ~*(^|[?&])(ical|outlook-ical)=1(?:&|$)    1;
    ~*tribe-bar-date                          1;

    # Common paths/slugs
    ~*/events                                  1;
    ~*/calendar                                1;
}

# Build a per-host, per-UA bucket (empty when not a known bot)
map $ua_bot $ua_bucket {
    default "";
    ~.+     "$host:$ua_bot";
}

# Final limiter key: only non-empty when events matched AND a bot was labeled
map $is_events_uri $scrape_limit_key {
    default "";
    1       $ua_bucket;
}

# --- Define zone & rate: 1 per 10s (6/min), NO burst
limit_req_zone $scrape_limit_key zone=scraping-bots-ua:10m rate=6r/m;

# --- Apply limiter (no burst => second hit inside 10s gets 429)
limit_req zone=scraping-bots-ua;
limit_req_log_level warn;

# --- Debug headers
map $scrape_limit_key $scrape_header { default "no"; ~.+ "yes"; }
add_header X-Scrape-Bot          $is_scraping_bot always;  # 1/0 flag
add_header X-Scrape-UA           $ua_bot always;           # UA label bucketed
add_header X-Scrape-URL          $is_events_uri always;    # 1 if URI matched
add_header X-Scrape-Limit-Active $scrape_header always;    # yes/no if armed
add_header X-Scrape-Limit-Rate   "6r/m per UA; burst=0" always;

Blocking Scraping Bots with Litespeed/Openlitespeed

Here is the appropriate .htaccess rules for blocking bots via Litespeed/Openlitespeed

# Define partial matches for known bots
RewriteCond %{HTTP_USER_AGENT} ahrefsbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} amazonbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} barkrowler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bingbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} blexbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} claudebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} crawler4j [NC,OR]
RewriteCond %{HTTP_USER_AGENT} curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} dataforseo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} datanyze [NC,OR]
RewriteCond %{HTTP_USER_AGENT} dotbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} duckduckbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} exabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gptbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} liebaofast [NC,OR]
RewriteCond %{HTTP_USER_AGENT} linkedinbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mb2345browser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} meta-externalagent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} micromessenger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mj12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} petalbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pinterest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} qwantbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} scrapy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} semrush [NC,OR]
RewriteCond %{HTTP_USER_AGENT} semrushbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} seznambot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sogou [NC,OR]
RewriteCond %{HTTP_USER_AGENT} twitterbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} yandexbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zh_cn [NC]

# Block URLs with specific query strings or paths for these bots
RewriteCond %{QUERY_STRING} (ical|eventDisplay) [NC,OR]
RewriteCond %{REQUEST_URI} ^/events/ [NC]
RewriteRule ^ - [F,L]

Confirming Blocks

You can check your access log to see if you’ve missed any user agents

tail -f /var/www/domain.com/logs/domain.com.access.log | grep -v 403 | grep event

User Agents

Collecting User Agents

If you’re on GridPane using the Openlitespeed logs, you can use the following command to get a list of unique User Agents accessing eventDisplay

cat /var/www/domain.com/logs/domain.com.access.log | grep "eventDisplay" |awk -F'"' '{print $7}' | sort | uniq

Valid User Agents

I’ve listed some common bot User Agents above, however there are also some to be aware of as actual users.

iOS/17.5.1 (21F90) dataaccessd/1.0 = iOS Calendar App

Changelog

08-28-2025 – Fixed issues with Nginx Option 1 and added Nginx Option 2 alternative that throttles instead of blocking.
01-02-2024 – Added section about Facebook Link Previews
01-01-2025 – Updated list of user agents and updated Nginx and .htaccess configs.
08-12-2024 – Updated Nginx to be correct syntax, also added more user agents and GridPane files.
08-18-2024 – Updated rules to include PetalBot

Protecting Your Events Calendar: Combatting Scraping Bots and Resource Drains

Table of Contents

Introduction

Bots Blocking List

Facebook Link Previews

Blocking Scraping Bots accessing Events Calendar Pages

Block with Cloudflare

Nginx Option 1 – Blocking Scraping Bots with Nginx

Step 1 – Identify the Bots via Maps in http section

Step 2 – Apply blocking to your site

Nginx Option 2 – Rate Limiting Scraping Bots

Step 1 – Identify the Bots via Maps in http section

Blocking Scraping Bots with Litespeed/Openlitespeed

Confirming Blocks

User Agents

Collecting User Agents

Valid User Agents

Changelog

Tags:

Protecting Your Events Calendar: Combatting Scraping Bots and Resource Drains

Table of Contents

Introduction

Bots Blocking List

Facebook Link Previews

Blocking Scraping Bots accessing Events Calendar Pages

Block with Cloudflare

Nginx Option 1 – Blocking Scraping Bots with Nginx

Step 1 – Identify the Bots via Maps in http section

Step 2 – Apply blocking to your site

Nginx Option 2 – Rate Limiting Scraping Bots

Step 1 – Identify the Bots via Maps in http section

Blocking Scraping Bots with Litespeed/Openlitespeed

Confirming Blocks

User Agents

Collecting User Agents

Valid User Agents

Changelog

Tags:

Join our Newsletter