Robots pwn3d my site

I’m not sure how many of my very small number of readers out there have to administer a real website but this is something that I’m sure is kind of ‘duh’ but I even managed to over look. Robots.txt is a very important file especially on a bandwidth limited server. I knew it was there and I kind of knew what it was used for but I hadn’t imagined how important it is untill the Campus Bible Fellowship website got pwn3d by Googlebot. Googlebot consumed just over 2GB of bandwidth in less than a month. Luckily we have a good amount of bandwidth available(2.5GB I think) a month for a relatively small website. There was a little panic when I started getting the e-mail notifications that the site was approaching its bandwidth limit. Luckily we didn’t hit this point till late in the month. So this brought back to my attention the need for robots.txt.

Robots.txt is a file that resides in your web root. The file is read by well behaved robots or crawlers. The automated things that go out and scan the internet to add information to the search engine databases. This file dictates what a robot can and can not do. You can define and constrain the folders and files that a robot is allowed to look at with in the file by a certain schema that is very easy to understand. It supports things like wild cards etc.

So the moral of the story here is do NOT ignore robots.txt as though its some formality that can wait till later. The same should be said for any security measure designed to protect what you plan to hang out there for all the world to see and it would seem also abuse.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: