Arclab® Website Analyzer

 

Search Your Website for Duplicate HTML Content
Find All Duplicate Pages without a Canonical Link Tag

 

 

Download Trial Version   Purchase

What means "Duplicate HTML Content"?

In this context, "Duplicate HTML Content" means that exactly the same HTML page can be accessed under 2 different URIs. For this purpose, a checksum is created for each HTML page and all pages found during the scan are compared with each other.

 

Excepted from the report are:

How can I Find "Duplicate HTML Content" on My Website?

Troubleshooting is easy with Arclab® Website Analyzer. First, let the program scan your website.
After the scan of your website is completed, you will receive a detailed report containing all errors found on your website and other information:

 

Website Analyzer Report

 

In the "Duplicate HTML Content" line, click "Show Details" to display details about the errors:

 

Details: Duplicate HTML Content

 

  • At least 2 HTML pages listed under (A) contain exactly the same content or have the same checksum (B).
  • This error can occur, for example, if you have renamed pages but the old page is still on the server and there are still links to the old page.
  • However, if this is intentional or for technical reasons, you should insert a "rel=canonical" link tag in the HTML header to avoid negative effects on the search results.

"rel=Canonical" HTML Header Link Tag

The "rel=canonical" link tag in the HTML header is used to prevent duplicate content issues in search engine optimization (SEO) by indicating the "canonical" or "preferred" version of a web page.

 

Sample:

<!doctype html>
<html lang="en">
<head>
...
<link rel="canonical" href="https://www.yourdomain.tld/some-page.html">
...
</head>

...

</html>

 

Insert the same (!) "rel=canonical" link tag in both pages and specify which URI should be used.