× Here you can get community support related to Tag Meta.

Best way to solve this duplicate content issue?

  • yaksushi
  • Topic Author
  • Offline
  • New Member
  • New Member
More
11 years 7 months ago #212 by yaksushi
I have the following example url being flagged by Google Webmaster Tools as Duplicate Content, what would be the best solution for solving this?

www.domain.com/forum/20-water-cooler/280-is-it-for-me

www.domain.com/forum/20-water-cooler/280-is-it-for-me?limit=10&start=10

Can I use Tag Meta to say that www.domain.com/forum/20-water-cooler/280-is-it-for-me is the canonical link for www.domain.com/forum/20-water-cooler/280-is-it-for-me?limit=10&start=10

or should I just give www.domain.com/forum/20-water-cooler/280-is-it-for-me?limit=10&start=10 a no index with the following rule?
.*limit=10&start*.

Thanks for a great component!

Please Log in or Create an account to join the conversation.

More
11 years 7 months ago #214 by admin
The first solution is better for Google and safer. The second rule could match also other pages without you control this.

Generally speaking, if you have a lot of different page with "paginations" variables, and these variables are always "limit" and "start" you could also use the GWT panel to specify to ignore these variables.

You could use new Tag Meta macros to autonatically build the canonical URL.

Kind regards,
Luigi

Please Log in or Create an account to join the conversation.

More
11 years 1 week ago #808 by marcel555
Hi Luigi,

I have a similar problem, but I fear it's more tricky. For example, the following urls occure on my website:
  • /category/123-testpage.html (correct)
  • /category/123-testpage/ (garbage)
  • /category/123-testpage (garbage)
  • /category/123-te (garbage)

I have no idea, where all this stuff is coming from or who/what is invoking those URLs. It is partially even indexed by Google, altough I have nowhere linked to this stuff. (My xml sitemap is ok, too.) I'm not interested in making 301 redirects to the correct URLs for all this garbage.

I know it is a general big weakness of Joomla, that such stuff is possible at all. Joomla only cares about the content ID. I'm not a programmer, but shouldn't it be relatively easy to also consider the URL alias + the configuration setting whether the URLs have the html suffix + the category structure? I mean, all the necessary information is already there, it "just" needs to be considered! In fact it IS considered when the correct URLs are created, for example in Joomla's own navigation menus. It should also be considered on every page call. Or am I wrong? Is it so hard in the Joomla framework to find out that the URL "/category/123-te" does not exist and that the 404 page is to be shown instead???

Why I'm posting this here is that I could imagine that you are able to help the community with this. I already use your ReDJ and TagMeta. You really know about the URL issues and your tools operate absolutely reliable and fluently. If you could extend (perhaps ReDJ?) to fight against such garbage (as mentioned obove), as far as I can see you would once more be the one and only in the Extension Repository. ;)

Very kind regards
Marcel

Please Log in or Create an account to join the conversation.

More
11 years 1 week ago - 11 years 1 week ago #814 by admin
Hi Marcel,
to be honest I tried to solve this kind of well knowed Joomla issue some times ago, for this site. And I was able to find a good solution and add into Tag Meta, but is not documented yet, ... :(

A similar problem I had with k2 articles, so I tried to fix it adding a canonical URL (the right one) to any page. To get the canonical URL automatically, I added a new macro, {routeurl}, and this is how is configured on this site:



So, for this page:

selfget.com/products/redj.html

You can access also with:

selfget.com/component/k2/item/1.html

But the canonical for the page is still:
<link rel="canonical" href="http://selfget.com/products/redj.html" />

So, the biggest issue here is that nobody know this feature ... except me!!! :)

I have also a video tutorial that shows how to use this macro, I will try to publish on this site very soon (I'm working to improve documentation).

Kind regards,
Luigi
Attachments:
Last edit: 11 years 1 week ago by admin.

Please Log in or Create an account to join the conversation.

More
11 years 1 week ago - 11 years 1 week ago #816 by marcel555
Hi Luigi,

any chance to have this without K2? And wouldn't it be more proper to do 301 redirects in this case instead of canonical tags? Because if the garbage pages exist I have at least 2 problems:
  • the cache becoms much larger (I don't have to worry about disk space, but I assume 20,000 cache files instead of 10,000 can have an effect on the system performance, does it?)
  • users could link to garbage pages (that's perhaps not really bad because of the canonical tag, but in the long run I don't like the idea to get links oon garbage pages. And it could happen that all the garbage pages won't exist in the future and don't receive a 301 redirect.)

Yesterday I did some further research and at the moment I think the solution I have could work in the long run:
  • In ReDJ I added redirects from old articles to the new ones (I restructured my website a month ago) and I made the "From URL" as common as possible, i.e. "/category-layer2/81-" or "/category-layer1/([a-zA-Z0-9-]*)/260-"
  • And in addition I added some code to my htaccess, see below (my website is configured to have the .html suffix in every URI)
RewriteCond %{REQUEST_URI} ([a-zA-Z0-9]+) [NC]
RewriteCond %{REQUEST_URI} !(\.) [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* index.php [R=404,L]

(Explanation for other forum users: Line 1: if the URI has any letter/number at all -> is not the homepage; Line 2: if the URI does not include a point; Line 3: if URI is not an existing file; Line 4: if URI is not an existing directory; Line 5: then throw a 404 error)

To give back a 404 to garbage is the most rude way, but it's web standard for non existing pages. The solution just has one little drawback so far: It does not return the designed 404 page of Joomla, but the raw standard apache (?) error page. But it's ok for me and it's only on pages without the .html suffix.

What do you think about this?


Many greetings
Marcel
Last edit: 11 years 1 week ago by marcel555.

Please Log in or Create an account to join the conversation.

More
11 years 1 week ago #817 by admin
You prefer to block any "non-canonical" URL, is a possible solution, in your case.

But is not a general solution, because not all sites have a common and well-knowed pattern for all canonical URLs. In these cases, redirection is not always possible, cause you should be able to create redirection rules for any non-canonical URL, a little tricky... :)

So, the canonical is an alternative solution which is safe and appreciated by Google. And the {routeurl} macro works with any extension, not just K2.

Best regards,
Luigi

Please Log in or Create an account to join the conversation.

Time to create page: 0.098 seconds