Robots.txt & Sitemaps: Clean Crawl for WordPress & WooCommerce
Author
John CavilA clean crawl means Google spends time on the right URLs—your products, posts, and key pages—not search results, filters, or logins. This guide gives you a minimal robots.txt, shows what to noindex (not block), and sets up sitemaps the right way for WordPress + WooCommerce.
What robots.txt does (and doesn’t)
- Does: tell crawlers which paths not to crawl.
- Does NOT: remove URLs from Google or stop indexing if other pages link to them. For that, use meta robots
noindex. - Rule of thumb: use robots.txt for utility paths (
/wp-admin/, infinite filter URLs), and usenoindex,followfor thin/utility pages (search, cart, checkout, account).
A minimal, modern robots.txt for WordPress
Start simple. Don’t copy legacy blocks like
/wp-includes/or/wp-content/—they often break CSS/JS crawling.
# robots.txt (minimal + safe)
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# Optional: block internal search URLs
Disallow: /?s=
Disallow: /search/
# Optional: block known noise (adjust to your site)
Disallow: /*add-to-cart=*
Disallow: /*coupon_code=*
Disallow: /*orderby=*
# Your XML sitemap index (Yoast/RankMath/WP core)
Sitemap: https://example.com/sitemap_index.xml
When to add parameter lines?
Only if your theme/plugins create infinite combinations (e.g., layered nav filters). If you rely on those for browsing, prefer noindex,follow on the template and let canonical tags point to the main category.
What to noindex (not Disallow)
These URLs should be accessible (so Google can see the meta tag), but not indexed:
- Search results:
/search/or/?s=…→noindex,follow - Cart, Checkout, My Account →
noindex,follow - Thank you / order-received →
noindex,follow - Pagination of archives (optional) → many sites keep indexed; if thin,
noindex,follow - Tag archives (optional) → if thin/duplicative,
noindex,follow - Author archives (single-author blogs) →
noindex,follow
Set this in your SEO plugin (Yoast/RankMath): Titles & Metas → Archives/Pages → noindex the templates above.
Sitemaps: what to include (and exclude)
Use your SEO plugin’s sitemap index:
Include:
- Posts, Pages, Products
- Product Categories (and optionally Post Categories)
Consider excluding:
- Tag archives (if thin)
- Author archives (single author)
- Test/staging/landing variations
- Anything set to
noindexshould not appear in sitemaps
Make sure lastmod updates when you refresh content. After publishing or updating, request indexing in GSC.
WooCommerce specifics (the safe defaults)
- Noindex:
/cart/,/checkout/,/my-account/,/order-received/ - Block crawl (optional): URL parameters that explode combinations:
/*add-to-cart=*,/*orderby=*,/*rating=*
- Canonical: category pages should canonical to themselves; filtered views should canonical back to the base category.
- Facets/filters: Prefer
noindex,followon the filtered template instead of blocking with robots.txt—so link equity can still flow through the page.
Validate in 5 minutes
- Robots.txt Tester (Search Console) → Confirm rules match expectations.
- URL Inspection → Check a cart/checkout URL shows “Excluded by ‘noindex’”.
- PageSpeed/DevTools → Ensure CSS/JS are crawlable (no blocked resources).
- Sitemaps in GSC → Submit
sitemap_index.xml, watch for Discovered → Indexed progression. - Re-check your SEO audit items:
What is SEO & How to Audit Your Website (Free)
Common pitfalls (and quick fixes)
- Blocking CSS/JS in
/wp-content/or/wp-includes/
→ Remove those Disallows. Google needs resources to render. - Disallowed but still indexed
→ You blocked crawl; Google can’t seenoindex. Allow crawl and setnoindex,follow. - Sitemap lists noindexed pages
→ Exclude them in your SEO plugin → resubmit sitemap. - Infinite filter URLs indexed
→ Addnoindex,followon filtered templates + canonical back to the base category. Add selective Disallows if the crawl still balloons. - Cart/Checkout indexed
→ Ensure they arenoindex,followand not in the sitemap. Remove any stray internal links pointing at them.
Example: WordPress + Woo robots & meta setup
robots.txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /search/
Disallow: /*add-to-cart=*
Disallow: /*orderby=*
Sitemap: https://example.com/sitemap_index.xml
Meta robots (via SEO plugin)
- Search results:
noindex,follow - Cart/Checkout/Account/Thank-you:
noindex,follow - Author archives (single author):
noindex,follow - Tag archives (if thin):
noindex,follow
Keep crawl efficient (performance tie-ins)
- Speed + caching: fast pages get more done per crawl budget.
LiteSpeed Cache: Safe Defaults - Edge/CDN headers: avoid redirect chains; respect HTTPS/www canonical.
Zero-Downtime DNS TTL Playbook - Don’t waste crawl on noisy JS/AJAX:
Stop Woo Cart Fragments - Tidy images & LCP: helps render and indexing quality.
Woo Image Speed - Core Web Vitals basics:
Core Web Vitals, Plain English
FAQ
Should I disallow /wp-content/uploads/?
No. That breaks image discovery and hurts SEO.
Can I block /cart/ in robots.txt?
Better: keep it crawlable and set noindex,follow so link equity flows through the page.
Do sitemaps boost rankings?
They don’t increase rankings directly—they help discovery and freshness. Rankings improve when the right pages are crawled and are fast, unique, and useful.
Do I need tag sitemaps?
Only if tags have unique value. Most stores can exclude tags from indexing and sitemaps.
Final word
Clean crawl ≠ complicated rules. Keep robots.txt minimal, push thin/utility pages to noindex,follow, and offer a tidy sitemap index. That’s it. If you want a speed head start while Google crawls more efficiently, Pofii’s Pofii-Tuned LiteSpeed stack pairs perfectly with this setup.
Leave a Comment