Good rankings are not only based on good content, but also on a clean structure of your website. Kevin Jackowski from OnlineMarketingEinstieg.de explains which tricks you can use to gain advantages.
If you want to place your website successfully in search engines, you should try to get a deeper understanding of how it works. It should be seen as a privilege to get targeted traffic from the search results.
So that site operators can benefit from this privilege, they should make it easier for search engines to crawl their own website. Ultimately, they are interested in giving searchers the right answers to their questions. Questions that site owners prefer to answer with their own website.
What do search engines value?
In order for your own website to be found for the relevant questions or keywords, the search engine crawlers must reach the relevant subpages, interpret their content and process it correctly. Only then do they end up in the index and, if necessary, are delivered to matching search queries.
Nowadays, almost all areas are about energy and efficiency – including search engines. Each website receives a specific index and crawling budget, which tells the crawlers the estimated time required for a domain. Webmasters can make critical mistakes here by using existing budgets for the wrong resources. With the help of a few notes you can determine the relevance of individual content. Search engines understand these tips, set priorities based on them and can thus be controlled to a certain extent.
The instructions in detail
Webmasters who understand that they should manage the budget carefully can benefit from the resources that the search engine makes available to them. But what do crawlers watch out for? Which information are relevant for you and influence the processing of a website? My article aims to answer these questions.
1. Protect yourself from duplicate content
Many webmasters are unaware that duplicate content creates problems. On the one hand, search engines find it difficult to deliver the “correct” document for search queries. On the other hand, duplicate content wastes the existing crawl budget, so that relevant content may not get into the index. Search engines are getting better and better at identifying duplicate content and fixing the first problem. For webmasters, duplicate content still results in a waste of their own budget.
In some cases, duplicate content cannot be avoided. Under these circumstances, website operators should explain to the crawlers in the source code how to handle them. The following options are helpful:
- The meta tag “noindex” : It ensures that a website is not included in the index. The meta tag is written in the <head> of the page and looks like this: <meta name = “robots“ content = “noindex“>. It helps to avoid the double indexing of “duplicate content”.
- The canonical tag: It is used when there is little or no difference between pages. The “duplicates” of a page refer to the “original” using a canonical tag. Webmasters use it to explain to the search engine which sub-page should be included in the index and delivered in response to matching search queries. The canonical tag is also written in the <head> of the page and looks like this: <link rel = “canonical“ href = “http://www.beispiel.de/“>.
- Correct status codes: Site operators should pay attention to correct status codes so that search engines save important resources. The following status codes are particularly relevant for search engines:
- Status code 200 – OK: It signals to the crawlers that everything is OK with this page and that a document can be reached. Important: If it is an error page that should actually receive the status 404, then this must also be marked as a 404 page. If this is not the case, one speaks of “Soft 404” errors.
- Status code 301 – Moved Permanently: If a resource is permanently available under a different URL, it should be redirected using status code 301. This status code ensures that relevant “link juice” is passed on. So if a forwarding is permanent and not temporary, always use “status code 301”.
- Status code 302 – Found: With status code 302, crawlers receive the information that a website is only temporarily accessible under a different URL. This means that no “link juice” is passed on to the new link target.
- Status code 404 – Not found: A 404 error tells crawlers that a document is not available at the specified URL – a bad sign. Webmasters should keep the number of these errors low. The webmaster tools help track down 404 errors on your own website.
- Status code 500 – Internal Server Error: When servers detect an internal error, they usually output a status code 500. Crawlers often stop crawling at this point and come back at a later point in time so that the server is not put under additional load.
- Status code 503 – Service Unavailable: If the status code 503 is output, the server is overloaded or is being serviced. For crawlers, this is an indication that they should continue their work at a later point in time. With the help of the “Retry-After” header field, webmasters can specify when the server will be able to process external requests again.
2. Avoid forwarding chains
Forwarding chains (also known as “redirect chains”) rob crawlers of important resources. Webmasters should therefore avoid this as much as possible. The incorrect use of status codes can lead to the fact that “Linkjuice” is not passed on, which is why site operators should refer to new link destinations with the correct status code (usually 301). Search engines partially cancel the crawling of the redirect chains. They also ensure increased loading times on mobile devices.
3. Structure your website with sitemaps
Sitemaps offer the opportunity to give crawlers an overview and to set priorities right from the start of their work. They can be separated in terms of content and can also be separated according to data type. so there are sitemaps for:
- mobile content.
In order for it to be found, the sitemap should be in the robots.txt of the website. If the maximum size is limited, webmasters can create a master that contains all other sitemaps.
Site operators can also submit the individual sitemaps in Google Webmaster Tools. In this way it can be checked to what extent the sitemaps have already been processed by the search engine.
Webmasters can use the following attributes to provide the crawler with additional information:
- <changefreq>: This attribute indicates how often the content of the document changes and when a recrawl is appropriate. The following attributes are available: always , hourly , daily , weekly , monthly , yearly , never . With the help of these attributes, website operators can, for example, identify individual page areas whose content changes less often; archives are a good example of this.
- <priority>: Webmasters who want to differentiate between the value of individual subpages in the sitemap can do so with this attribute. It indicates how high the priority of an individual document is compared to all other documents. The default value is “0.5”, the entire range is between “0.1” and “1.0”. With the help of this attribute, webmasters can tell the search engine which documents are particularly important to them, so that the search engine uses more resources for this.
- <lastmod>: The attribute indicates when a sitemap was last changed. Important: This is not about the content, but about the sitemap. The use of this attribute is only necessary after adapting the sitemap.
4. Make hard-to-crawl content easy to understand
In the past, crawlers sometimes had difficulties with Ajax-based content. Although the processing has gotten better now, webmasters should help the crawlers with the processing of all subpages. If Ajax is used to dynamically reload content, the following should be observed:
- In order for crawlers to process the elements marked with Ajax, they must be marked. To do this, the crawlers must be provided with other URLs, as Google explains in a manual .
- The URLs must contain a token in the hash fragments, which signals the Ajax content to the crawlers. For unique pages, the token is an exclamation mark.
- The crawlers must receive an HTML snapshot from the server for each URL to be indexed, which contains all content visible to the user. So that the server knows which version to give the crawlers, it temporarily changes the Ajax URL. It replaces the hash value (#!) In “? _Escaped_fragment_ =” and asks for the snapshot.
- For pages that are to be indexed without hash fragments (for example the start page or individual subpages), the following meta tag must be inserted in the <head> of the page: <meta name = “fragment“ content = “!“>. Here, too, an HTML snapshot is required for the respective page, which crawlers can request from the server.
- The URLs that are to be indexed in this form should also be entered in the sitemaps.
As a webmaster, to be sure that the search engine was able to process and index all Ajax content, you should check it with the help of the Google Webmaster Tools. In the “crawling” area you will find the menu item “retrieval as by Google”. Here users only have to enter the URL with the hash value (#!) And click on “Retrieve and render”.
5. Take advantage of internal links
With the help of internal links, page operators can define the most important sub-pages of their website. The frequency with which a document is linked signals its priority to crawlers. Accessibility is also particularly important: the faster a subpage can be reached from the start page, the greater its importance. Site operators should therefore choose a flat hierarchy or make all sub-pages available using optimized paginations, link modules or sitemaps.
6. Create connections
Crawlers have problems recognizing the connection between two subpages, especially when it comes to paginated articles. Site operators can remedy this with the attributes <rel = “next“> and <rel = “prev“>. They are stored in the <head> of a website and thus create a relationship between several documents.
The attribute <rel = “next“> refers to the next parts of a document, <rel = “prev“> to the previous parts. Crawlers not only recognize the connection between two websites, but also their order.
7. Provide structured content
Structured data offer website operators another option to provide the crawler with additional information about the content stored on a website. They are integrated into the sub-pages by means of tags and partially affect their display in the search results. In this context, one also speaks of ” rich snippets “. They are available for the following types of information, among others:
In order to test whether this data has been correctly marked within the document, Google offers a ” test tool for structured data “.
Not only unique and high-quality content, but also an optimal technical infrastructure inspire the rankings of your websites. Therefore, always think of your two target groups: the users who want to read your content and the crawlers who have to interpret and process it.
They both want …
- recognize contextual context,
- get important information at a glance,
- find thematically similar information with little effort,
- read the selected document,
- recognize a well-structured structure,
- Receive as much relevant additional information as possible and
- spend little energy searching for relevant information.
If you respond to these wishes, you will secure the sympathy of users and crawlers. And best of all, you will appreciate it.
Our article “ Onpage SEO: The Most Important Ranking Factors ” provides a basic introduction to the topic of onpage optimization .