How to block banner advertisements using squid

by Craig Sanders <cas@taz.net.au>

UPDATE 1999-05-02

Added a NULLJS keyword (and accompanying do_nothing.js file) for blocking out unwanted javascript files from SCRIPT SRC="...url..." tags.
 
Geocities and imgis seem to be the main culprits for crap like this at the moment.

UPDATE 1998-09-22

Danny Sauer (Cloudmaster <sauer@cloudmaster.ml.org>) sent me a HTML page containing the javascript code to close a window. this is useful for getting rid of those annoying popup advertising windows that geocities and tripod are so fond of.
 
This (and the CLOSEME keyword for the redir file) has been included in the latest version of squid-redir.

UPDATE 1998-08-04

Please note that there is a much newer and better version of this script available. i've been using this "new" version for over a year...I just haven't got around to fixing up this page yet. One of these days i'll update this page properly to reflect the new version.
 
In the meantime, you can view the README or download squid-redir.tar.gz (only 4KB).
 
you can view the perl code and my latest redir rules file before you download.
 
BTW, feel free to email me and give me any good banner blocking regexps you come up with. i don't want to set up any automated system for sharing regexps (that would be too easy to abuse), but i'm willing to share/swap redir files.

An increasing number of web sites have advertising banners embedded in them. At first, these banners were fairly unobtrusive and didn't bother me much at all - in fact, every once in a while i'd see something i was actually interested in.

Over the last 6-12 months or so, however, the use of animated GIFs has become almost universal. What used to be static, mostly inoffensive banners have now become ugly, garish flashing annoyances which commit the unforgivable crime of distracting from the content. It's pretty hard to read the text of a page while some obnoxious graphic is cycling through it's animation loop.

Several months ago, I was annoyed enough by one particularly irritating advertisment (I believe it was a Microsoft ad encountered while searching for something on Alta Vista) to take the time to figure out how to Squid's redirection facility to block them out.

Squid has the ability to spawn a few child processes whose job is to rewrite URLs on the fly. Mostly, this is used to redirect requests for commonly downloaded files to a local mirror. It can also be used to redirect advertising banners to a GIF on the local web server.

When configured to use a redirector, squid passes the URL and other information to one of it's redirector children. It does this by printing the following information to the child's stdin, each field separated by a space:

URL ip-address/fqdn ident method

URLis the URL requested
ip-address/fqdnis the IP address or fully qualified domain name of the client (web browser) which requested the page.
identis the identity of the user running the web browser. Unless you configure squid to do ident lookups, this will be "-".
methodis the request method: "GET", "POST", "HEAD"

All of these fields can be used by the redirector script to decide whether or not to rewrite the URL. For my purposes, I'm only interested in the URL: if it looks like a known advertisement, then rewrite it. If the redirector script determines that a URL should be rewritten, it returns the modified URL on stdout. If not, it should return a blank line.

I decided to write my redirector script in perl, and also make it use a hashed database file to store the redirection database. Using a database allows me to change the redirection list without having to restart squid. It was also a good practice exercise for the Perl DB techniques I learnt from the March 1997 Linux Journal.

What you'll need

Squid

Squid is a freely available caching proxy server. You'll need version 1.1 or later (versions before 1.1 didn't have any redirection ability).

The Squid proxy/cache is available from http://squid.nlanr.net/. If you're running Debian, just install the squid .deb package

A web server

Any web server will do. I use apache.

makemap from sendmail

My script uses a hashed database file for it's redirection-rule database. The easiest way to generate a hashed db file is using the makemap program which is a part of sendmail.

I could have just designed the program so that any matched pattern was replaced by the hardcoded string "http://www.taz.net.au/blank_ad.gif". This would work fine, but wouldn't allow me to do any other form of redirection.

With this format, I can not only block out advertising banners, I can also redirect commonly downloaded files to either a local mirror, or to my preferred source site. Either will reduce bandwidth consumption. For example:

.*/n16e301.exe$      ftp://ftp.netscape.com.au/pub/navigator/3.01/windows/n16e301.exe

redirects all requests for Netscape 3.01 for Windows to the Australian netscape site. This can make a huge saving in bandwidth, especially for an ISP with many customers downloading copies of netscape.

a replacement graphic

I use this one which I made with The Gimp. It's the same size (460x60) as most banner advertisements.

[blank_ad.gif]

Alternatively, you can redirect to a non-existant graphic on your web server. This has three disadvantages:

Putting it all together

The following instructions assume that you are running a Debian GNU/Linux system. They should be easily adapted to other non-Debian systems - about the only thing you'll have to change is the pathnames for the files: /etc/squid.conf, /usr/lib/squid/squid.redir, /usr/lib/squid/redir and /usr/lib/squid/redir.db

Configuring Squid

You need the following line in your /etc/squid.conf file:

redirect_program /usr/lib/squid/squid.redir

If you run a busy squid, you may also need to increase the number of redirector child processes. For example:

redirect_children 15

You also need to put a copy of the redirector script into /usr/lib/squid:

  #!/usr/bin/perl  
    
  use DB_File;  
  use Fcntl;  
    
  $|=1;  
    
  $redir_file = '/usr/lib/squid/redir.db' ;  
    
  tie (%redir_db, 'DB_File', $redir_file, O_RDONLY, 0644, $DB_HASH) || die ("Cannot open $redir_file");  
    
  while (<>) {  
    
      chop ;  
      # URL ip-address/fqdn ident method  
      ($url, $address, $ident, $method) = /(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/ ;  
    
      foreach $key (keys %redir_db) {  
          if ($url =~ s/$key/$redir_db{$key}/i) {  
              print $url ;  
              last ;  
          }          
      }  
    
      print "\n";  
    
  }  
    
  untie %redir_db  
    

You can download squid.redir here.

Remember to restart squid after you've made these changes.

On a Debian system, type: /etc/init.d/squid reload.
On other Linux systems, type: killall -1 squid.

Caution:If you're not using Linux, do not use the killall command - on Linux it means "kill process by name". On some variants of unix (e.g. SCO) it means "kill all processes".

Configuring the Web Server

No special configuration is needed. Just make sure that your replacement image is accessible under the Document Root.

Generating the redir.db

Create a file called /usr/lib/squid/redir. It is the source text file containing two fields per record: The regular expression patterns to match, and the associated replacement string. The two fields should be separated by white space. Comments can be imbedded in the file.

Once you have a redir file, make a hashed database file from it by running the following commands:

cd /usr/lib/squid
makemap -f hash redir <redir

Whenever you edit the redir text file, you must remember to regenerate the .db file.

Here's my redir file:

  # Redirection rules for squid  
  #  
  # generate redir.db from this file by typing:  
  #     makemap -f hash redir   
  #  
  //.*/Adverts/.*//www.taz.net.au/blank_ad.gif  
  //.*/adverts/.*//www.taz.net.au/blank_ad.gif  
  //.*/gifs/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*/graphics/advert.*//www.taz.net.au/blank_ad.gif  
  //.*/home/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*/image.ng.*//www.taz.net.au/blank_ad.gif  
  //.*/image/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*/images/adds/.*//www.taz.net.au/blank_ad.gif  
  //.*/images/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*/img/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*/logoshowad.*//www.taz.net.au/blank_ad.gif  
  //.*/pictures/sponsor/.*//www.taz.net.au/blank_ad.gif  
  //.*/sponsors/images/.*//www.taz.net.au/blank_ad.gif  
  //.*ancestry.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*apcmag.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*cmpnet.com/ads/graphics/.*//www.taz.net.au/blank_ad.gif  
  //.*cnet.com/Banners/.*//www.taz.net.au/blank_ad.gif  
  //.*cnnfn.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*dejanews.com/gtplacer.*//www.taz.net.au/blank_ad.gif  
  //.*desktoppublishing.com/ad/.*//www.taz.net.au/blank_ad.gif  
  //.*doubleclick.net/ad/.*//www.taz.net.au/blank_ad.gif  
  //.*doubleclick.net/viewad/.*//www.taz.net.au/blank_ad.gif  
  //.*eads.com/graphics/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*excite.com/img/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*focalink.com/SmartBanner///www.taz.net.au/blank_ad.gif  
  //.*four11.com/g/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*gamelan.com/Advertisements/images/.*//www.taz.net.au/blank_ad.gif  
  //.*i8.net/worldnet/ad.cgi.*//www.taz.net.au/blank_ad.gif  
  //.*imgis.com/?adserv.*//www.taz.net.au/blank_ad.gif  
  //.*imgis.com/images/.*//www.taz.net.au/blank_ad.gif  
  //.*infoseek.com/doc/sponsors/images/.*//www.taz.net.au/blank_ad.gif  
  //.*infoworld.com/ads/gif/.*//www.taz.net.au/blank_ad.gif  
  //.*intellicast.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*looksmart.com/r?.*gif//www.taz.net.au/blank_ad.gif  
  //.*mcp.com/ad_banners/.*//www.taz.net.au/blank_ad.gif  
  //.*miningco.com/zadz/.*//www.taz.net.au/blank_ad.gif  
  //.*motherjones.com/global/ADVERTISEMENTS/.*//www.taz.net.au/blank_ad.gif  
  //.*movielink.com/media/imagelinks/MF.ad.*//www.taz.net.au/blank_ad.gif  
  //.*movielink.com/media/imagelinks/MF.sponsor.*//www.taz.net.au/blank_ad.gif  
  //.*mrshowbiz.com/ad/.*//www.taz.net.au/blank_ad.gif  
  //.*msn.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*mudconnect.com/ads///www.taz.net.au/blank_ad.gif  
  //.*mydesktop.com/img/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*netscape.com.au/ads/images/.*//www.taz.net.au/blank_ad.gif  
  //.*netscape.com.au/inserts/images/.*//www.taz.net.au/blank_ad.gif  
  //.*netscape.com/ads/images/.*//www.taz.net.au/blank_ad.gif  
  //.*netscape.com/inserts/images/.*//www.taz.net.au/blank_ad.gif  
  //.*news.com/Banners/Images/.*//www.taz.net.au/blank_ad.gif  
  //.*riddler.com/Commonwealth/bin/statdeploy.*//www.taz.net.au/blank_ad.gif  
  //.*safe-audit.com/exposure.cfm.*//www.taz.net.au/blank_ad.gif  
  //.*shareware.com/Banners/Images/.*//www.taz.net.au/blank_ad.gif  
  //.*sjmercury.com/advert/logos/.*//www.taz.net.au/blank_ad.gif  
  //.*smh.com.au/adproof.*//www.taz.net.au/blank_ad.gif  
  //.*techweb.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*tqn.com/zadz/.*//www.taz.net.au/blank_ad.gif  
  //.*tripod.com/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*tucows.wire.net.au/images/adds/.*//www.taz.net.au/blank_ad.gif  
  //.*webreview.com/universal/graphics/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*windows95.com/gifs/ads/.*//www.taz.net.au/blank_ad.gif  
  //.*winntmag.com/images/.*//www.taz.net.au/blank_ad.gif  
  //.*winntmag.com/titlebar/titlebar.stm//www.taz.net.au/blank_ad.gif  
  //.*wire.net.au/images/adverts/.*//www.taz.net.au/blank_ad.gif  
  //.*wired.com/advertising/.*//www.taz.net.au/blank_ad.gif  
  //.*wisewire.com/ClickAd.emc.*//www.taz.net.au/blank_ad.gif  
  //.*wisewire.com/SKB/.*//www.taz.net.au/blank_ad.gif  
  //.*worldvillage.com/adds/banners///www.taz.net.au/blank_ad.gif  
  //.*yahoo.com/adv/.*//www.taz.net.au/blank_ad.gif  
  //.*zdnet.com/adverts/.*//www.taz.net.au/blank_ad.gif  
  //204.123.2.101/ads/.*//www.taz.net.au/blank_ad.gif  
  //images.yahoo.com/promotions/.*//www.taz.net.au/blank_ad.gif  

You can download redir here.

Similar hacks

There are other methods for doing this. All of them that I know of are much slower than using Squid because they use forking proxy servers to do the job.

WebFilter is a patch for CERN httpd's proxy function which can also block out advertising. In some ways it is more capable than my method as it rewrites the text within a page before passing it on to the client web browser. A great idea, but IMO unacceptably slow for anything but a small, lightly used proxy. This idea really needs a non-forking server on a fast machine

OreO from the OSF is a general purpose filtering/transformation model for proxy servers.

V6 from Inria is another general purpose proxy with filtering capabilities based on WebFilter.

It may be possible to use or modify the proxying module in apache to do the same thing

Junkbusters have a program which can block advertising banners and cookies too. Their FAQ is good reading. Junkbusters also have a lot of information about blocking spam.

Final Comments

The web is a lot nicer place without all the flashing advertisements. Not as nice as it was a few years ago before the net went commercial, but those days are gone forever - the best we can do now is take control of our own personal view of the net and customise it to suit ourselves.

This script and the techniques illustrated here can be used for other purposes, as the Netscape example above shows.

Theoretically, it is possible for an ISP to use the techniques outlined here to redirect advertising banners and URLs to a local CGI script which then redirects to an advertising site which generates income for the ISP. While this may be an ethically questionable thing to do, it is a good example of subverting the net for your own purposes.

Thanks

Thanks to the people on the debian-user and melblinux mailing lists (especially Roderick Schertler <roderick@argon.org> and Manoj Srivastava <srivasta@datasync.com>) for pointing out that makemap needs the -f option to stop it from converting all input to lowercase, and suggesting some improvements to my script.

References