Skip to main content

Robots.txt & Sitemaps: Clean Crawl for WordPress & WooCommerce

9 months ago
1
0
0

A clean crawl means Google spends time on the right URLs—your products, posts, and key pages—not search results, filters, or logins. This guide gives you a minimal robots.txt, shows what to noindex (not block), and sets up sitemaps the right way for WordPress + WooCommerce.

What robots.txt does (and doesn’t)

  • Does: tell crawlers which paths not to crawl.
  • Does NOT: remove URLs from Google or stop indexing if other pages link to them. For that, use meta robots noindex.
  • Rule of thumb: use robots.txt for utility paths (/wp-admin/, infinite filter URLs), and use noindex,follow for thin/utility pages (search, cart, checkout, account).

A minimal, modern robots.txt for WordPress

Start simple. Don’t copy legacy blocks like /wp-includes/ or /wp-content/—they often break CSS/JS crawling.

# robots.txt (minimal + safe)
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# Optional: block internal search URLs
Disallow: /?s=
Disallow: /search/

# Optional: block known noise (adjust to your site)
Disallow: /*add-to-cart=*
Disallow: /*coupon_code=*
Disallow: /*orderby=*

# Your XML sitemap index (Yoast/RankMath/WP core)
Sitemap: https://example.com/sitemap_index.xml

When to add parameter lines?
Only if your theme/plugins create infinite combinations (e.g., layered nav filters). If you rely on those for browsing, prefer noindex,follow on the template and let canonical tags point to the main category.


What to noindex (not Disallow)

These URLs should be accessible (so Google can see the meta tag), but not indexed:

  • Search results: /search/ or /?s=…noindex,follow
  • Cart, Checkout, My Accountnoindex,follow
  • Thank you / order-receivednoindex,follow
  • Pagination of archives (optional) → many sites keep indexed; if thin, noindex,follow
  • Tag archives (optional) → if thin/duplicative, noindex,follow
  • Author archives (single-author blogs) → noindex,follow

Set this in your SEO plugin (Yoast/RankMath): Titles & Metas → Archives/Pages → noindex the templates above.


Sitemaps: what to include (and exclude)

Use your SEO plugin’s sitemap index:

Include:

  • Posts, Pages, Products
  • Product Categories (and optionally Post Categories)

Consider excluding:

  • Tag archives (if thin)
  • Author archives (single author)
  • Test/staging/landing variations
  • Anything set to noindex should not appear in sitemaps

Make sure lastmod updates when you refresh content. After publishing or updating, request indexing in GSC.


WooCommerce specifics (the safe defaults)

  • Noindex: /cart/, /checkout/, /my-account/, /order-received/
  • Block crawl (optional): URL parameters that explode combinations:
    • /*add-to-cart=*, /*orderby=*, /*rating=*
  • Canonical: category pages should canonical to themselves; filtered views should canonical back to the base category.
  • Facets/filters: Prefer noindex,follow on the filtered template instead of blocking with robots.txt—so link equity can still flow through the page.

Validate in 5 minutes

  1. Robots.txt Tester (Search Console) → Confirm rules match expectations.
  2. URL Inspection → Check a cart/checkout URL shows “Excluded by ‘noindex’”.
  3. PageSpeed/DevTools → Ensure CSS/JS are crawlable (no blocked resources).
  4. Sitemaps in GSC → Submit sitemap_index.xml, watch for DiscoveredIndexed progression.
  5. Re-check your SEO audit items:
    What is SEO & How to Audit Your Website (Free)

Common pitfalls (and quick fixes)

  • Blocking CSS/JS in /wp-content/ or /wp-includes/
    → Remove those Disallows. Google needs resources to render.
  • Disallowed but still indexed
    → You blocked crawl; Google can’t see noindex. Allow crawl and set noindex,follow.
  • Sitemap lists noindexed pages
    → Exclude them in your SEO plugin → resubmit sitemap.
  • Infinite filter URLs indexed
    → Add noindex,follow on filtered templates + canonical back to the base category. Add selective Disallows if the crawl still balloons.
  • Cart/Checkout indexed
    → Ensure they are noindex,follow and not in the sitemap. Remove any stray internal links pointing at them.

Example: WordPress + Woo robots & meta setup

robots.txt

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /search/
Disallow: /*add-to-cart=*
Disallow: /*orderby=*
Sitemap: https://example.com/sitemap_index.xml

Meta robots (via SEO plugin)

  • Search results: noindex,follow
  • Cart/Checkout/Account/Thank-you: noindex,follow
  • Author archives (single author): noindex,follow
  • Tag archives (if thin): noindex,follow

Keep crawl efficient (performance tie-ins)


FAQ

Should I disallow /wp-content/uploads/?
No. That breaks image discovery and hurts SEO.

Can I block /cart/ in robots.txt?
Better: keep it crawlable and set noindex,follow so link equity flows through the page.

Do sitemaps boost rankings?
They don’t increase rankings directly—they help discovery and freshness. Rankings improve when the right pages are crawled and are fast, unique, and useful.

Do I need tag sitemaps?
Only if tags have unique value. Most stores can exclude tags from indexing and sitemaps.


Final word

Clean crawl ≠ complicated rules. Keep robots.txt minimal, push thin/utility pages to noindex,follow, and offer a tidy sitemap index. That’s it. If you want a speed head start while Google crawls more efficiently, Pofii’s Pofii-Tuned LiteSpeed stack pairs perfectly with this setup.


4 min read
Share this post:

0 comments

Leave a Comment

Please, enter your comment.
Please, enter your name.
Please, provide a valid email address.
Enjoy this post? Join our newsletter
Don’t forget to share it

Related Articles

All posts