Magento is widely regarded as one of the most challenging ecommerce platforms from an SEO perspective, due to the complexity of it’s rewrite engine, reliance on dynamic content and complex codebase (compared to other PHP platforms), amongst other things. That said, all of the SEO issues associated with the Magento platform can be resolved, it’s just a case of knowing what needs to be done – which is why I wrote this article. A lot of the technical fixes will require development resource, but I’d also recommend using something like MageWorx to give you more control within Magento. This just makes things a bit more manageable (can setup hreflang, improve canonical URLs, add structured data, assign noindex rules etc) than adding the functionality yourself. I previously released my own module (MageSEO), but I struggled to compete with their levels of support and regular feature releases (as their module is a full-time focus, whereas mine was a side project to help clients).

Over the last few years, I’ve consulted for Magento retailers from all over the world and have worked on a host of large, complex Magento builds. During this time I’ve faced (and resolved) pretty much every technical Magento SEO issue imaginable, so I thought I’d document some of the more common issues in this article.

There are quite a lot of considerations around configuring your Magento implementation, purely because of the level of complexity with the platform. Here are some of the key areas that need to be factored into your Magento Community or Enterprise implementation.

Configurable and simple product configuration

If you’re using configurable products alongside simple products in Magento, you need to ensure that you’re not allowing the duplicate variants to be indexed. Lots of people use simple products for things like different colours or different versions of a product, so that they can then be displayed independently on product list pages.

If you’re not intentionally creating simple products and adding additional unique content to the pages, I’d suggest using the canonical tag to illustrate that the simple products are variants of the main configurable product.

Canonical URLs in general

In the back-end of Magento there are options to set the canonical tag to appear on product and category pages, which will ensure that the canonical tag points to the primary version of these pages at all times. This ‘should’ help to prevent issues with dynamic variants of categories being indexed, however it doesn’t in most cases, because the pages are so different. Unless you’re adding a bespoke canonical tag implementation, I’d suggest enabling both of these options, which can be found in the configuration section of the Magento admin interface.

You will also need to add the canonical tag to your homepage and for CMS pages, however this will need to be done manually. In more recent versions of Magento CE and EE, Magento automatically uses the canonical tag to canonicalise the hierarchical product URLs (with category path) to the top-level URLs.

Product title tag conventions

Another consideration is the way that Magento assigns product title tags, which is usually simply the name of the product. I’d suggest either setting a convention to include variables based on characteristics (eg gender, colour etc) or manually assigning them, which is always going to be the preference.

Headings

Magento has a tendency to use headings incorrectly, most commonly by assigning H2 tags to product listings on category pages. I’d suggest just checking this as I’ve also seen implementations with multiple H1s on different pages types and no headings at all. This is a minor consideration compared to some of the other points listed in this article and it can be a result of using a theme / template.

Product URLs

I’d suggest setting your Magento implementation to use top-level product URLs, rather than selecting the option to include category path in the URLs. I cover this in more detail later on.

Redirects

There’s an option in the back-end of Magento to set permanent redirects if the URL is changed – I’d suggest setting this to yes, although just be aware of potential rewrite issues if you’re making changes to URL keys regularly or are doing CSV uploads incorrectly. Again, this is covered in more detail later on.

Layered / faceted navigation issues

It is very common for layered or faceted navigation to cause duplicate content issues with Magento, as filters that create new URLs will often get indexed by search engines. Magento out of the box appends query string filter parameters, but a number of non-proprietary modules also use directories and various other conventions, which still lead to the same outcome. This is especially something to review if you’re using Amasty or Manadev.

One of the biggest mistakes I see retailers make in ecommerce SEO is simply changing the dynamic convention that the URLs and meta titles are based on – so effectively making the URL look clean and making the title tag slightly more targeted. This is not a good idea as, due to the volume of low quality pages being indexed, it leaves a website susceptible to Google Panda, which specifically targets websites with high levels of indexable pages with very little unique content.

Resolutions

Prevent search engines from crawling the pages

If you have a large store or you’re using multi-select layered navigation, I’d suggest blocking these pages via the robots.txt or nofollowing the links to the pages – or both – which is what I’d probably do, in case you have issues with your robots.txt file being over-written or disappearing etc.

This will help to prevent the pages from using up crawl budget, however it will also make it harder for search engines to access products, I’d suggest this route for larger merchants – but if you have a relatively small site, I’d probably suggest using meta robots tags.

Use a noindex, follow tag

I regularly opt to use a noindex directive to tell search engines not to index layered navigation pages. You can use either meta robots tags or x robots tags to do this and they will both still meet the requirements for removal requests in Google Webmaster Tools and ultimately serve the same message to search engines.

The main benefit of using the noindex, follow tag is that search engines can still crawl the pages and the links on the pages, but they’re instructed not to index them. This can be a good thing if you have issues with search engines crawling products (particularly if you’re heavily merchandising a set of products and others are only accessible via dynamic pages).

This can be achieved by using the MageSEO plugin (my Magento SEO module) or most other modules – this allows you to assign manual meta robots rules, gives you greater control over the canonical tag and also allows you to edit the robots.txt file from the Magento back-end.

If you’re using the multi-select option on your layered navigation or if you’re facing issues with crawl budget, I’d suggest actually blocking the URLs.

Use the canonical tag

The canonical tag can be a very good solution for duplicate pages from being indexed, by canonicalising back to the parent category page (this will also pass value back to the page). I do have a couple of reservations with the canonical tag – the first is that I’ve seen many occasions where search engines have ignored the canonical tag if the pages are very different (because of the different products being served). The second is that if you already have the pages in the index (and you need to resolve the issue fast) it can take a long time for Google and other search engines to adapt to the change. You also won’t be able to manually remove the pages as they won’t meet Google removal guidelines.

AJAX navigation

Using AJAX navigation will enable you to filter the products featured on your website without changing the URL, leaving only the original category page. If you don’t have an experienced Magento developer on-hand, this is likely to be a bit of a nightmare. There are modules that can help with this, but to be honest if you don’t have a good developer to help, I would just avoid making this switch. Also, this can cause a lot of additional technical issues – I’ve seen retailers adopt this method and block product links via excess javascript on multiple occasions. If you choose to go down this route (which should be about user experience as well as SEO), make sure you consult an experienced SEO consultant before making the switch.

If you’re using AJAX for layered navigation, make sure the filtered links aren’t actually linking to a URL in the background, as this can cause additional issues with dynamic pages being indexed, even though the URLs aren’t actively seen when browsing the site.

Parameter handling in Google Webmaster Tools

I’ve always suggested that the parameter handling resource in Google Webmaster Tools isn’t overly effective, especially with faceted navigation duplicate content issues – however, I’ve heard lots of very good SEOs saying it’s got a lot better.

Like the canonical tag, I’d recommend implementing this even if you’re using other solutions.

This can generally be managed very effecively using the MageWorx SEO Suite – which allows you to control canonical URLs (far more comprehensively), assign noindex rules, manage the robots.txt (as well as lots of other things) from the Magento back-end.

Search pages

Another very annoying issue with Magento is when catalogue search pages get indexed by Google, as generally there will be thousands. In order to prevent this, I would simply recommend disallowing the assigning noindex, follow meta robots tags to the search page and the children queries contained within the directory. If you’re having issues with crawl budget, I’d suggest using the robots.txt file instead of meta robots tags.

Sitemaps

Magento is known for being pretty rubbish when it comes to creating XML sitemaps, as it just provides the option of creating one, standalone sitemap which often contains pages that you don’t want to be indexed / accessed by search engines.

I usually suggest creating two XML sitemaps, one for products and one for category and content pages – this allows for better visibility of indexation of pages (via Google Webmaster Tools) and also makes it easier for Google to access all of your pages if you have a large site. For very large websites, I usually suggest splitting products out by brand or type, again providing more insight into indexation. You would then reference the different sitemaps in an index sitemap.

This can be either by done by your developers or you can use a third party module.

Issues with URL rewriting

Another surprisingly common issue with Magento is URL rewriting, which can cause either category or products page URLs to revert back to the original /catalog/ URLs, which do not write them based on the title of the page or for these URLs to run in parallel.

I would recommend blocking these URLs and just keeping an eye on them, as I’ve had issues in the past (caused by issues with refreshing) where all of the URLs have been reverted back to the old structure without 301 redirects being applied.

Another issue with Magento rewrites is when Magento appends a number to the end of product and category URLs, so -1, -2, -3 for example. This happens when an existing URL path is already being used and the rewrites are replacing the original URL without changing it. This can be a huge issue and usually requires development resource to resolve. If you have any questions about this – feel free to drop me an email.

Another cause for the number being added to the URL comes from updating products via a CSV file – when this happens you simply need to ask your developer to remove the rewrites from the rewrite table, although this should be tested extensively before being applied to your live site.

Secure pages

It’s really common for secure pages to be indexed by Google (for non-secure websites), which can be very annoying from a duplicate content perspective. For these pages, I would recommend either canonicalising the https pages to the http versions or applying a rewrite rule – both should be applied with exclusions for pages that should be https, such as checkout pages or account pages.

The issues with the https pages being indexed is caused by URLs not being absolute, meaning when they’re crawled on https pages, they visit the https version of the page. I’d also recommend changing the links to be absolute.

Duplicate product page content because of hierarchical URLs (including category path within URLs)

If you’re using hierarchical URLs, which will include the category name name within the product page URL path, you’re setting yourself up to have duplicate variations of a product within multiple categories. For this reason, I would strongly recommend using top-level product URLs, as this will prevent issues like this and allow you to have one single representative version of a product.

If you’re already using category paths within your product URLs, make sure you’re using the canonical tag to point to the primary version of each product – this will also resolve the issue, although top-level products would be the best option.

Also, if you’re switching to top-level products, make sure you get your developer to create a rewrite rule to redirect old product URLs, as by default, Magento will not redirect them.

Pagination

Magento product list pages will use pagination to break up a large list of products, which will usually mean the same content is being used on different variations of the same page. These pages should really use the rel next and prev tag, which was introduced by Google in 2011 to help webmasters illustrate that they are paginated pages.

The tag, which can be added to the pagination links or via the <head> would look like this:

<link rel=”next” href=”http://website.com/clothing?p=6″ />

<link rel=”prev” href=”http://website.com/clothing?p=4″ />

I’d suggest using a noindex, follow meta robots tag alongside rel next and prev.

Session IDs / SIDs

Sessions ID pages are used to record a user’s session, commonly when they move from one domain to another. These pages can cause all kinds of duplicate content issues if they are indexed and there’s no limit to the amount of URLs that can be generated.

If you do have an issues with session IDs, I’d suggest using either the robots.txt file or meta robots tags to prevent the pages from being indexed. This will also allow you to submit removal requests if the pages are indexed.

Magento performance

Magento is notoriously slow, which in very severe cases, can impact organic search rankings – it also impacts crawlability, making performance optimisation really important for larger sites. There are a number of things you can do to optimise the speed of your Magento website, such as:

  • Using an optimised and well configured server
  • Disable Magento logs and enable log cleaning (in the Magento back-end)
  • Enable merging of CSS and javascript (in the Magento back-end)
  • Leverage browser caching
  • Using a CDN for images (also look at image compression services)
  • Optimising front-end assets

This is something I’d strongly recommend anyway from a user experience / conversion optimisation perspective and I’d suggest getting your developers to allocate some time to improving the above aspects (and lots more) with a view to optimising the speed of your store.

Ensure that 301 redirects are being used

By default, older versions of Magento use 302 redirects as default, which will prevent redirects and rewrite rules from passing value. I would strongly recommend ensuring that your redirects are set to 301 to ensure that you’re making the most of your link value.

Duplicate product content via multi-store

If you’re using multi-store to manage multiple Magento instances – that’s not a valid reason for using the same product content across multiple websites (unless you’re using it for internationalisation or you’re using the canonical tag to prevent this from being an issue, which most won’t be). I’ve worked with lots of merchants who have used multi-store to manage the same catalog across multiple websites, which will mean all of your product content will be duplicated across each domain. I’d suggest creating a second set of product content if you’re using multi-store to manage multiple stores.

International targeting with Magento

For one reason or another, Magento merchants seem to really struggle with international store implementations from an SEO perspective. The most common configuration I see is merchants using one catalog and using an extension to add additional back-end fields (meaning the rest of the page remains the same) or using multi-store and making the mistake of using the same URLs, meta data, copy (in some places) and just creating variants for different countries they want to target.

The first of those two options (international extension) also usually serves the pages via odd query string URLs, as it’s as a variant of the original page. Also, if pages don’t have content usually these pages will have the content of the original page – which is bad from a panda perspective, as this means you’re going to have duplicates of the original pages.

I would suggest using multi-store, noindexing pages until they have unique content and also using the hreflang tag, which I’ve provided more detail on below. You can choose to do this via a sub-directory or separate CCtlds. The href-lang tag should be implemented so that the equivalent URL is referenced from other stores on a page-level basis – I see too many sites that just reference the homepage on every page of the site.

Using the hreflang tag with Magento

The rel alternate hreflang tag helps users to tell Google that you have an alternative international version of a given page and is a fundamental part of international SEO. The hreflang tag will help to ensure that the right version of a site or individual page is served in the right regional version of Google.

There are a few modules available, however I’ve not properly used any of them, so I’d suggest using your development resource to implement it. The implementation itself will differ dependant on if you’re using multi-store or implementing international pages via a plugin or custom fields. Here’s an example (using different CCtlds) of how it should look if implemented correctly on the page (can also be done via your Sitemap).

<link rel=”alternate” hreflang=”en-gb” href=”http://www.example.co.uk”>

<link rel=”alternate” hreflang=”en-au” href=”http://www.example.com.au”>

<link rel=”alternate” hreflang=”fr-fr” href=”http://www.example.fr”>

If you’re doing it on the same domain, it’d look like this (this uses a product example):

<link rel=”alternate” hreflang=”en-gb” href=”http://www.example.co.uk/example-product-1″>

<link rel=”alternate” hreflang=”en-au” href=”http://www.example.com.au/example-product-1″>

<link rel=”alternate” hreflang=”fr-fr” href=”http://www.example.fr/example-product-1″>

I did a project for a mid-sized furniture retailer around a year ago – they were using an extension and around 20% of the pages had unique content, the rest served the content from the original page (so English). They had no hreflang tag and lots of low quality pages in the index. I implemented the hreflang tag, removed lots of pages (until we’d published new content) and paid lots of students to write localised content for products and category pages) – they saw an increase in organic visibility of over 350% over the next year, including serious improvements for international phrases.

Magento OOTB review functionality

If you’re using the out of the box Magento review functionality, it will mean that all of your product reviews are published on the product page and also a separate page, which aggregates all of the reviews on the product page. Each product page has one of these pages, which are essentially duplicating the product review content. I’ve seen a lot of issues from this, including cannibalisation issues (the lower quality duplicate page ranking instead of the primary product page) and also a huge amount of thin pages being indexed for products that don’t have reviews (as the pages exist as a product is added).

Here’s an example of what these pages look like:

Screen Shot 2015-06-25 at 17.50.17

Having an aggregated page for review content is fairly common and most major review vendors do this with an OOTB implementation (such as BazaarVoice, Yotpo and Power Reviews to name a few). This is something to be mindful of though as these pages can cause a lot of issues.

Feed-related duplicate content

If you’re using your feed to send content to affiliate websites or resellers, this can cause a huge amount of issues with duplicate content. This is one of the most common causes of Panda penalties and can have a significant impact on your organic visibility.

I’d suggest having two feeds, one for your website and one for third party websites, as this will eliminate the risk of your visibility being impacted as a result.

Magento configuration

If you’re about to launch a new Magento store, I’d suggest using the following to ensure that your new store is  fully optimised, however if you have an existing site, making these changes can impact your visibility and it will require more planning.

Here are some of the recommendations I generally make for configuring your Magento store:

  • Enable canonical URLs for product and category pages (config > catalog > search engine optimization)
  • Set product URLs to top-level (config > catalog > search engine optimization – allow category path in product URLs, set to ‘no’)
  • Enable XML sitemap and create at least one (catalog > Google Sitemap)
  • If you use list mode for product list pages (don’t allow product content to be displayed)
  • Enable the creation of redirects when URLs are changed (but keep an eye on rewrites if numbers are appended to URLs)

Robots.txt configuration with Magento

The robots.txt is very valuable, especially for large ecommerce websites, but it can also cause issues with crawl fragmentation if used incorrectly. I’d suggest only using the robots.txt file if you want / need search engines not to crawl specific sections of the website (such as search pages, pages with session IDs, multi-select layered navigation etc). The robots.txt should be used to prevent pages from being crawled, but remembered there are other routes to blocking pages (lots of Magento merchants seem to block everything via the robots.txt file because of some early robots.txt templates that were released).

I would generally suggest that you block the following using the robots.txt:

  • Catalogsearch pages
  • /catalog/ URLs (unless you have images and CSS & JS within this folder)
  • URLs with session ID parameters
  • Layered navigation pages (if you have multi-select filters or a large website)
  • Sort / order parameters
  • Admin pages

So if you were to follow this, you robots.txt would look something like (I use the * in case you have multi-store under sub-directories):

User-agent: *

Disallow: /admin/
Disallow: *price=*
Disallow: *dir=*
Disallow: *order=*
Disallow: *limit=*
Disallow: */catalog/*
Disallow: */catalogsearch/*
Disallow: */customer/*
Disallow: *SID=*

Sitemap: ###

Magento SEO modules:

I previously created my own Magento SEO module (called MageSEO), but I quickly found that supporting it was very difficult without dedicated technical resource. For the last few months I’ve been using and recommend the MageWorx SEO suite, which is a very comprehensive module that’s been around for a long time.

The MageWorx module offers a huge amount of SEO features, including:

  • Hreflang support – includes lots of options to ensure it suits your store’s catalog setup
  • Options for product catalog setup – this is a commonly required feature where merchants have multiple versions of the same product, usually simple, bundled or configurable products. The module allows you to configure canonical URLs to essentially reference the primary versions of the URL. So for example if you have a shirt that’s a configurable product with different sized versions as simple products, you could set the canonical URL to the configurable version on each of them.
  • Rules for meta robots rules – this is a feature I use on all stores I work on. You’re able to set rules (with wildcards) to assign meta robots tags directives to groups of pages (e.g. noindex, follow on all URLs containing ?price=).
  • Ability to edit your robots.txt file from the Magento back-end
  • Control over https / http duplicate URLs
  • Ability to create more advanced rules for canonical URLs (for things like trailing slashes etc)
  • Options for rewriting URLs
  • Options around creating meta title conventions (across different pages)
  • Advanced XML sitemap options
  • Pagination options
  • The ability to override various other store-level settings

These are just a few of the core features, there are loads more – it does ‘pretty much’ everything you need, which I attribute to it being the most popular module and them getting lots of feature requests from merchants and agencies.

I’ve used this module on 20+ stores and haven’t yet had any real issues with it – although you do need to be careful when configuring it. I’d suggest matching it exactly to your current setup to start with and then applying fixes, as some of the settings could cause issues around how your site is crawled (e.g. incorrect canonical URLs etc).

If you have any questions around configuring – feel free to email me ([email protected]).

Other modules that I’ve used before that have been pretty good include:

  • CreareSEO (free, but less features than most of the premium ones – does the fundamentals though)
  • MageWorx sitemap module (this is useful as a standalone module for extending the Magento sitemap)
  • Mirasvit module (good module that covers the fundamentals with a few extra options)

I also wrote this Magento SEO module comparison piece to make it easier to decide on which module to use.

If you have any questions about anything I’ve mentioned in this blog post or anything else related to Magento, please feel free to drop me a message via this form. I also provide Magento SEO audits and consultancy for large Magento projects.