At the highest level, search engine optimisation is about three things: creating great content, promoting that content to generate links, and making sure that Google and other search engines can find it. It’s the last of these tasks I’m going to talk about here — specifically, how WordPress site owners can make sure that when Googlebot comes knocking, the door is open and every page the site owner wants in search results can be found.
What does it mean to be findable by a search engine crawler? The most important factor is links. Crawlers work by following links. They load a page, process its content, and follow links on the page to new pages, where the process is repeated. To be crawlable is to be connected in such a way that every important page on a site is reachable by a link, which brings us to the first part of creating a crawlable site.
Your site should have a logical and consistent structure of internal links users — both human and software users — can follow. Information architecture is part of this — the site’s page structure should be designed so that human users and bots can travel link-to-link through the whole site. There should be no dead spots or inaccessible areas that aren’t linked, otherwise Googlebot won’t be able to find them.
It’s also important to cross link pages that are deep in your site’s page hierarchy. One of the best ways to promote deep links is to blog and link from high-quality blog content to other pages on the site.
Googlebot doesn’t like slow sites any more than human users. If a site is slow to return pages to the crawler, it will reduce the rate at which it crawls. Ideally, sites should be crawled frequently so that new pages and content are incorporated quickly into the index.
Google has also made it clear that performance is a ranking factor. All things being equal, faster sites will rank higher.
Submit A Sitemap
Sitemaps are lists of links with metadata that help Googlebot more intelligently crawl a site. Sitemaps indicate where content is and the type of content. In an ideal world, sitemaps wouldn’t be necessary and Google would be able to figure all this out by itself, but there’s no harm in giving it a helping hand.
One of the easiest ways to generate a sitemap is to use the Yoast SEO plugin’s sitemaps module.
Robots.txt and Nofollow, Noindex
Robots.txt is a file that contains instructions for bots, including search engine crawlers. If you get this wrong, Google may ignore part or even all of your site. A typical simple Robots.txt file looks like this.
This says, if you’re Googlebot, go ahead and crawl everything. But if you’re some other bot, keep out. I wouldn’t recommend this in production, because there are plenty of other useful bots out there. Robots.txt would require an article of its own to explain fully, so instead I’ll link you to this excellent resource and a list of examples you can use.
Nofollow and noindex give instructions to bots too, but they are placed in the head section of the relevant page:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Noindex instructs a crawler not to index the page, which is sometimes useful for pages you don’t want in the search index, like category pages. Nofollow instructs the browser not to follow any of the links on the page. As with robots.txt, getting this wrong can have a negative impact on SEO. Yoast has a great article explaining the minutiae of the meta robots tag.
This article has focused on how you can make your site more friendly to Google’s crawler, but you should keep in mind that no matter how crawlable your site is, it’s the quality of the content that ultimately determines your rank in the SERPS.