Arclab® Website Analyzer
Search Your Website for Duplicate HTML Content
Find All Duplicate
Pages without a Canonical Link Tag
What means "Duplicate HTML Content"?
In this context, "Duplicate HTML Content" means that exactly the same HTML page can be accessed under 2 different URIs. For this purpose, a checksum is created for each HTML page and all pages found during the scan are compared with each other.
Excepted from the report are:
- HTML pages that already contain a "rel=canonical" link tag in the HTML header.
- The folder "/" and the default page name e.g. "index.html" in the folder.
- In the "Website Settings" you can add or remove default page names if you want to include them in the report.
How can I Find "Duplicate HTML Content" on My Website?
Troubleshooting is easy with Arclab® Website Analyzer.
First, let the program scan your website.
After the scan of your website is completed, you will receive a detailed
report containing all errors found on your website and other information:
In the "Duplicate HTML Content" line, click "Show Details" to display details about the errors:
- At least 2 HTML pages listed under (A) contain exactly the same content or have the same checksum (B).
- This error can occur, for example, if you have renamed pages but the old page is still on the server and there are still links to the old page.
- However, if this is intentional or for technical reasons, you should insert a "rel=canonical" link tag in the HTML header to avoid negative effects on the search results.
"rel=Canonical" HTML Header Link Tag
The "rel=canonical" link tag in the HTML header is used to prevent duplicate content issues in search engine optimization (SEO) by indicating the "canonical" or "preferred" version of a web page.
Sample:
<!doctype html>
<html lang="en">
<head>
...
<link rel="canonical" href="https://www.yourdomain.tld/some-page.html">
...
</head>
...
</html>
Insert the same (!) "rel=canonical" link tag in both pages and specify which URI should be used.