Storage News
Security News
Networking News 
FREE NEWSLETTERS
search
 

follow us on Twitter


internet.commerce
Be a Commerce Partner















internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Related Articles
OpenVPN: Providing a Secure Connection for Your Users
Firewall Guide: First Steps to Securing the Enterprise
OpenVPN: Closing the Remote Access Security Gap
Five Essential Computer Forensics Tools

Security Products
 KRyLack Password Recovery (KRyLack Software)
 Encrypt PDF (PDF Converter)
 Advanced Encryption Package Professional (InterCrypto Ltd)
 kllabs ZIP RAR ACE Password Recovery (KLLabs)
 Sim Card Backup (Sim Card Backup)
 System Monitoring Software (Cracking email password)
» Enterprise IT Planet » Security » Security Features

Secure Apache: Out, Damned Bot!

By Ken Coar
December 4, 2008

Email Print Digg This Add to del.icio.us

Dynamic robots.txt

So, let's make robots.txt a dynamic document – a PHP script. That allows us to scan a database that can be updated in real time by other processes, making our robots.txt rules really dynamic.

Thoreau had a good idea when he advised us to 'simplify, simplify!' Let's assume you have different restrictions for different bots – such as for Google versus Yahoo!, for instance. If your robots.txt file is static, it will need to have stanzas for each of the specific bot rules – which means that all bots can see what their competitors' access rules are. (Can you see that I'm going after case #4?)

If it's a dynamic document, however, we can feed each bot only those rules that apply to it and it alone. The robots.txt that the bot will see is much simpler than our overall set of rules &mdash that's where 'simplify, simplify' comes in &mdash even though it means a little more work being done by our server. One of the truisms of security work is that increased security always costs something. I say that a lot, so get used to it.

For a first step, let's make your existing robots.txt file in PHP script that just returns its current contents. Nothing new and fancy, just making it dynamic in the most basic way. Where I need to mention server configuration directives, I'll use those for the Apache Web server. Make adjustments as appropriate for whatever server you're using.

1. First, make the server aware that robots.txt is a script and not an actual text file. Add the following to your httpd.conf file and then restart Apache.

<Files "robots.txt">
   SetHandler application/x-httpd-php
</Files>

2. Edit your robots.txt file and add the following to the top:

<?php
   Header('Content-type: text/plain');
?>

3. Try to access it from your Web browser. If all is working properly, you'll only see the normal rules, and not the PHP code segment you just added.

(If you're not familiar or comfortable with PHP, feel free to use some other scripting language of your choice. All my code examples are going to be in PHP, though.)

You should now have a basic dynamic robots.txt document. Go right ahead and play with it to see what you can do. You may actually want to work with a different file (called new-robots.txt or something like that), so that if you make any mistakes you won't screw up the rules spiders are currently using to crawl your site.

In my next article I'll go into much more detail about fleshing this basic script out to do some actual work. This one is primarily intended to raise your consciousness, get you started, and possibly spur you to do a little research on your own.

Go to page: Prev  1  2  

Email Print Digg This Add to del.icio.us

Security Features Archives