Overunity.com Archives is Temporarily on Read Mode Only!



Free Energy will change the World - Free Energy will stop Climate Change - Free Energy will give us hope
and we will not surrender until free energy will be enabled all over the world, to power planes, cars, ships and trains.
Free energy will help the poor to become independent of needing expensive fuels.
So all in all Free energy will bring far more peace to the world than any other invention has already brought to the world.
Those beautiful words were written by Stefan Hartmann/Owner/Admin at overunity.com
Unfortunately now, Stefan Hartmann is very ill and He needs our help
Stefan wanted that I have all these massive data to get it back online
even being as ill as Stefan is, he transferred all databases and folders
that without his help, this Forum Archives would have never been published here
so, please, as the Webmaster and Creator of this Forum, I am asking that you help him
by making a donation on the Paypal Button above
Thanks to ALL for your help!!


Google crawler stealing all the bandwidth traffic..

Started by hartiberlin, January 09, 2008, 05:10:48 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

hartiberlin

Hi All,
although I am still on vacation I could analyse the traffic being made on this forum and unfortunately the Google bot crawler makes about 10 to 20 times more traffic than all the power users over here.
I already set in Google Webmastertools the setting to crawl this site slower, but this also did not help.
Also the Google bot does not follow the Crawl-delay parameter in robots.txt.

Is there any other solution to stop the Google bot to crawl so fast ?
Maybe it is because of the Adsense Ads ?
Any help would be greatly appreciated,
maybe setting it somehow to error 503 for temporarely not available , not to be thrown out of the index ?
Many thanks.
Regards, Stefan.
Stefan Hartmann, Moderator of the overunity.com forum

helmut

Hi Stefan
Dont forget to enjoy your Vacation.
The world will keep on turning

helmut

Earl

If your Web server is running under Linux, can you use a cron job to copy and overwrite robots.txt such that only x hours per night robots.txt says

User-agent: *
Allow: /

The rest of the day it is overwritten to show

User-agent: *
Disallow: /

For example make a file called allow.txt and the cron job would say
echo allow.txt > robots.txt

and the file disallow.txt and the cron job would say
echo disallow.txt > robots.txt

allow.txt and disallow.txt are the same except one line and contain
the entire robots.txt


For your info, Slurp (yahoo/AV) and MSFT bots obey crawl delay,
Googlebot not yet but will most likely in 2.1+


less than 35 percent of servers have a robots.txt file


this is crazy, but over 75,000 robots.txt files have pictures in them!

Regards, Earl


"It is through science that we prove, but through intuition that we discover." - H. Poincare

"Most of all, start every day asking yourself what you will do today to make the world a better place to live in."  Mark Snoswell

"As we look ahead, we have an expression in Shell, which we like to use, and that is just as the Stone Age did not end for the lack of rocks, the oil and gas age will not end for the lack oil and gas, but rather technology will move us forward." John Hofmeister, president Shell Oil Company

hartiberlin

Hi Earl,
nice idea !
This sounds like an easy solution.
Many thanks for this tip.

I just wonder, if Google tries again after a few hours to access
my site, when it was blocked already ?
Stefan Hartmann, Moderator of the overunity.com forum

amigo

You could use .htaccess in the root of the web and Mod_Rewrite (if this server runs Apache) to effectively block Google or have rules based on time tied to scripts that check last visited time etc.

It really depends what is the ultimate goal but mod_rewrite is pretty powerful, though with steep learning curve to begin with. :)