Adding robots.txt file to your StoreFront
Introduction
The robots.txt file is a standard used by web crawlers to determine which areas of your site should or should not be indexed. Proper configuration of this file helps control search engine behavior, protect sensitive areas of your StoreFront, and improve crawl efficiency.
UltraCart automatically serves a default robots.txt file for all StoreFronts, but you can easily override it with a custom version via the File Manager.
Tip: Use the
robots.txtfile in combination with meta tags (likenoindex) and password protection to comprehensively control access to your content.
Example robots.txt
Here’s a basic robots.txt file that blocks specific internal URLs while ensuring that your sitemap is accessible for indexing by search engines.
# robots.txt for UltraCart Production StoreFront
# Grants AI bots access while protecting sensitive areas
# Explicitly allow search engines
User-agent: *
# Sensitive paths to block for all crawlers
Disallow: /cgi-bin/UCEditor
Disallow: /cgi-bin/UCSearch
Disallow: /merchant/signup/signup2Save.do
Disallow: /merchant/signup/signupSave.do
# Make sure to replace the host to your actual storefon
Sitemap: https://www.yourstore.com/sitemapsdotorg_index.xmlDownload:
Enhanced robots.txt
This version provides explicit allow rules for major LLM’s. If you wish to disallow a specific LLM crawler, change allow to disallow.
# robots.txt for UltraCart Production StoreFront
# Grants AI bots access while protecting sensitive areas
# Explicitly allow search engines
User-agent: *
Allow: /
# AI and LLM Crawlers
# Change Allow to Disallow for any that you wish to restrict
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: CCBot
Allow: /
User-agent: YouBot
Allow: /
User-agent: NeevaBot
Allow: /
User-agent: Sogou
Allow: /
# Optional: Crawl delay to limit load
# Crawl-delay: 10
# Sensitive paths to block for all crawlers
Disallow: /cgi-bin/UCEditor
Disallow: /cgi-bin/UCSearch
Disallow: /merchant/signup/signup2Save.do
Disallow: /merchant/signup/signupSave.do
# Sitemap location
# Make sure to replace the host to your actual storefon
Sitemap: https://www.yourstore.com/sitemapsdotorg_index.xml
You'll notice that UltraCart specifies the URL where the crawler can fetch the sitemap file.
This helps with making sure crawlers are properly indexing your entire site.
Update the sitemap URL to match the host of your storefront, then upload the robots.txt file to the root folder in the file manager.
Example robots.txt for Development Website
You may want to prevent crawling during development. You can “lock” your storefront (recommended), or use a staging appropriate robots.txt file that disallow crawlers:
# robots.txt for UltraCart Staging StoreFront
# Blocks all crawlers by default to avoid indexing
User-agent: *
Disallow: /
# Optional: Uncomment to allow specific bots on staging (not recommended)
# User-agent: GPTBot
# Allow: /
# Sitemap reference (staging)
# Sitemap: https://staging.yourstore.com/sitemapsdotorg_index.xml
Adding the Robots.txt File to Your Storefront
To add the robots.txt file to your storefront:
Navigate to Main Menu → StoreFronts → (Select StoreFront Host) → File Manager.
Next, click the new file button:
When the dialog pops up, enter "robots.txt" and click on "OK".
Enter robots.txt file content and click OK.
Best Practices for robots.txt
Here are key recommendations when creating or editing your robots.txt:
Use
User-agent: *to define rules for all crawlers, or specify individual agents (e.g.,Googlebot,GPTBot) for more control.Block admin or sensitive paths that should not appear in search results.
Always include a
Sitemapdirective pointing to your sitemap for improved indexing.Avoid blocking CSS or JS resources necessary for rendering the page properly.
Tip: Test your
robots.txtwith tools like Google Search Console’s Robots.txt Tester.
Considerations for AI and LLM Crawlers
As of now, most large language model (LLM) crawlers do not officially adhere to the robots.txt standard, even if they claim to. However, some do attempt to respect it as a best-effort practice.
Note: These blocks are not guaranteed to be respected but are becoming a growing industry norm. Always review the LLM provider's documentation for up-to-date crawler names.
Conclusion
Using a robots.txt file gives you control over how crawlers interact with your StoreFront. While UltraCart provides a helpful default, you can create a custom file to better align with your SEO strategy or privacy needs. Don’t forget to periodically review and update this file as your site evolves.