Search on Pages Sites
It's easy to add search functionality to a site.
We recommend using Search.gov, a free site search and search analytics service for federal web sites. You will need to register for Search.gov and follow their instructions to integrate this service with your Pages site. For full details, visit Search.gov.
If you'd prefer another solution, you can configure a tool like lunrjs that creates a search function run using the client browser. This avoids any dependency on another service, but the search results are not as robust.
Crawl/Index Pages sites
Pages automatically handles search engine visibility for preview URLs via the
Pages proxy. For traffic served through a preview site, the Pages proxy
automatically serves the appropriate HTTP robots header, robots:none. Preview
URLs are not crawlable or indexable by design. Only webpages on the production
domain are served with the robots: all directive, indicating to crawlers and
bots such as search.gov to index the site and enable search capabilities.
If you want to disable crawling and indexing for specific pages of your
production site, you can include the noindex/nofollow meta tag in the head of
those pages, or include those folders in your robots.txt, if your site generates
one.
| Method to manage robot behavior | How to prevent indexing/crawling | How to allow indexing/crawling |
|---|---|---|
| robots.txt in your Pages site Discourages robots from crawling the page or pages listed. Webpages that aren’t crawled generally can’t be indexed. | User-agent: * | N/A, crawling is allowed by default |
| X-Robots-Tag HTTP header (served by Pages via the Pages proxy) Encourages or discourages robots to read and index the content on this page or use it to find more links to crawl | robots: none(this is automatically served to visitors of all Pages preview builds) | robots: all(this is automatically served to visitors of all Pages preview builds) |
| Robots meta tag in your Pages site webpage HTML Encourages or discourages robots to read and index the content on this page or use it to find more links to crawl | content="noindex, nofollow” | N/A, indexing is allowed by default |
Conditionally set robots - Eleventy (11ty)
Take advantage of Pages-provided environment variables to enable
environment-specific functionality. Hardcode the condition and meta tags to
check the branch from the process.env environment variable. This differs from
how it is dealt with on a Jekyll site, you are able to add specificity with
process.env.BRANCH. You can use this code sample
{% unless process.env.BRANCH == 'main' %}
<meta name="robots" content="noindex, nofollow" />
{% endunless %}
See additional documentation on build environment variables.