UPDATE 1999-05-02
Added a NULLJS keyword (and accompanying do_nothing.js file) for blocking
out unwanted javascript files from SCRIPT SRC="...url..." tags.
 
Geocities and imgis seem to be the main culprits for crap like this at the
moment.
UPDATE 1998-09-22
Danny Sauer (Cloudmaster <sauer@cloudmaster.ml.org>) sent me a
HTML page containing the javascript code to close a window. this is
useful for getting rid of those annoying popup advertising windows that
geocities and tripod are so fond of.
 
This (and the CLOSEME keyword for the redir file) has been included in the
latest version of squid-redir.
UPDATE 1998-08-04
Please note that there is a much newer and better version of this script
available. i've been using this "new" version for over a year...I just
haven't got around to fixing up this page yet. One of these days i'll
update this page properly to reflect the new version.
 
In the meantime, you can view the README
or download squid-redir.tar.gz (only 4KB).
 
you can view the perl code and
my latest redir rules file before you
download.
 
BTW, feel free to email me and give me any good banner blocking regexps
you come up with. i don't want to set up any automated system for
sharing regexps (that would be too easy to abuse), but i'm willing to
share/swap redir files.
An increasing number of web sites have advertising banners embedded in them. At first, these banners were fairly unobtrusive and didn't bother me much at all - in fact, every once in a while i'd see something i was actually interested in.
Over the last 6-12 months or so, however, the use of animated GIFs has become almost universal. What used to be static, mostly inoffensive banners have now become ugly, garish flashing annoyances which commit the unforgivable crime of distracting from the content. It's pretty hard to read the text of a page while some obnoxious graphic is cycling through it's animation loop.
Several months ago, I was annoyed enough by one particularly irritating advertisment (I believe it was a Microsoft ad encountered while searching for something on Alta Vista) to take the time to figure out how to Squid's redirection facility to block them out.
Squid has the ability to spawn a few child processes whose job is to rewrite URLs on the fly. Mostly, this is used to redirect requests for commonly downloaded files to a local mirror. It can also be used to redirect advertising banners to a GIF on the local web server.
When configured to use a redirector, squid passes the URL and other information to one of it's redirector children. It does this by printing the following information to the child's stdin, each field separated by a space:
URL ip-address/fqdn ident method
URL | is the URL requested |
ip-address/fqdn | is the IP address or fully qualified domain name of the client (web browser) which requested the page. |
ident | is the identity of the user running the web browser. Unless you configure squid to do ident lookups, this will be "-". |
method | is the request method: "GET", "POST", "HEAD" |
All of these fields can be used by the redirector script to decide whether or not to rewrite the URL. For my purposes, I'm only interested in the URL: if it looks like a known advertisement, then rewrite it. If the redirector script determines that a URL should be rewritten, it returns the modified URL on stdout. If not, it should return a blank line.
I decided to write my redirector script in perl, and also make it use a hashed database file to store the redirection database. Using a database allows me to change the redirection list without having to restart squid. It was also a good practice exercise for the Perl DB techniques I learnt from the March 1997 Linux Journal.
The Squid proxy/cache is available from http://squid.nlanr.net/. If you're running Debian, just install the squid .deb package
makemap
program which is a part of
sendmail
.I could have just designed the program so that any matched pattern was replaced by the hardcoded string "http://www.taz.net.au/blank_ad.gif". This would work fine, but wouldn't allow me to do any other form of redirection.
With this format, I can not only block out advertising banners, I can also redirect commonly downloaded files to either a local mirror, or to my preferred source site. Either will reduce bandwidth consumption. For example:
.*/n16e301.exe$ ftp://ftp.netscape.com.au/pub/navigator/3.01/windows/n16e301.exe
redirects all requests for Netscape 3.01 for Windows to the Australian netscape site. This can make a huge saving in bandwidth, especially for an ISP with many customers downloading copies of netscape.
Alternatively, you can redirect to a non-existant graphic on your web
server. This has three disadvantages:
/etc/squid.conf
, /usr/lib/squid/squid.redir
,
/usr/lib/squid/redir
and /usr/lib/squid/redir.db
redirect_program /usr/lib/squid/squid.redir
If you run a busy squid, you may also need to increase the number of redirector child processes. For example:
redirect_children 15
You also need to put a copy of the redirector script into
/usr/lib/squid
:
#!/usr/bin/perl
use DB_File;
use Fcntl;
$|=1;
$redir_file = '/usr/lib/squid/redir.db' ;
tie (%redir_db, 'DB_File', $redir_file, O_RDONLY, 0644, $DB_HASH) || die ("Cannot open $redir_file");
while (<>) {
chop ;
# URL ip-address/fqdn ident method
($url, $address, $ident, $method) = /(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/ ;
foreach $key (keys %redir_db) {
if ($url =~ s/$key/$redir_db{$key}/i) {
print $url ;
last ;
}
}
print "\n";
}
untie %redir_db
You can download squid.redir here.
Remember to restart squid after you've made these changes.
On a Debian system, type: /etc/init.d/squid reload
.
On other Linux systems, type: killall -1 squid
.
Caution:If you're not using Linux, do not use the
killall
command - on Linux it means "kill process by name".
On some variants of unix (e.g. SCO) it means "kill all processes".
/usr/lib/squid/redir
. It is the
source text file containing two fields per record: The regular
expression patterns to match, and the associated replacement string.
The two fields should be separated by white space. Comments can be
imbedded in the file.Once you have a redir file, make a hashed database file from it by running the following commands:
cd /usr/lib/squid
makemap -f hash redir <redir
Whenever you edit the redir text file, you must remember to regenerate the .db file.
Here's my redir file:
# Redirection rules for squid
#
# generate redir.db from this file by typing:
# makemap -f hash redir
#
//.*/Adverts/.* //www.taz.net.au/blank_ad.gif
//.*/adverts/.* //www.taz.net.au/blank_ad.gif
//.*/gifs/ads/.* //www.taz.net.au/blank_ad.gif
//.*/graphics/advert.* //www.taz.net.au/blank_ad.gif
//.*/home/ads/.* //www.taz.net.au/blank_ad.gif
//.*/image.ng.* //www.taz.net.au/blank_ad.gif
//.*/image/ads/.* //www.taz.net.au/blank_ad.gif
//.*/images/adds/.* //www.taz.net.au/blank_ad.gif
//.*/images/ads/.* //www.taz.net.au/blank_ad.gif
//.*/img/ads/.* //www.taz.net.au/blank_ad.gif
//.*/logoshowad.* //www.taz.net.au/blank_ad.gif
//.*/pictures/sponsor/.* //www.taz.net.au/blank_ad.gif
//.*/sponsors/images/.* //www.taz.net.au/blank_ad.gif
//.*ancestry.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*apcmag.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*cmpnet.com/ads/graphics/.* //www.taz.net.au/blank_ad.gif
//.*cnet.com/Banners/.* //www.taz.net.au/blank_ad.gif
//.*cnnfn.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*dejanews.com/gtplacer.* //www.taz.net.au/blank_ad.gif
//.*desktoppublishing.com/ad/.* //www.taz.net.au/blank_ad.gif
//.*doubleclick.net/ad/.* //www.taz.net.au/blank_ad.gif
//.*doubleclick.net/viewad/.* //www.taz.net.au/blank_ad.gif
//.*eads.com/graphics/ads/.* //www.taz.net.au/blank_ad.gif
//.*excite.com/img/ads/.* //www.taz.net.au/blank_ad.gif
//.*focalink.com/SmartBanner/ //www.taz.net.au/blank_ad.gif
//.*four11.com/g/ads/.* //www.taz.net.au/blank_ad.gif
//.*gamelan.com/Advertisements/images/.* //www.taz.net.au/blank_ad.gif
//.*i8.net/worldnet/ad.cgi.* //www.taz.net.au/blank_ad.gif
//.*imgis.com/?adserv.* //www.taz.net.au/blank_ad.gif
//.*imgis.com/images/.* //www.taz.net.au/blank_ad.gif
//.*infoseek.com/doc/sponsors/images/.* //www.taz.net.au/blank_ad.gif
//.*infoworld.com/ads/gif/.* //www.taz.net.au/blank_ad.gif
//.*intellicast.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*looksmart.com/r?.*gif //www.taz.net.au/blank_ad.gif
//.*mcp.com/ad_banners/.* //www.taz.net.au/blank_ad.gif
//.*miningco.com/zadz/.* //www.taz.net.au/blank_ad.gif
//.*motherjones.com/global/ADVERTISEMENTS/.* //www.taz.net.au/blank_ad.gif
//.*movielink.com/media/imagelinks/MF.ad.* //www.taz.net.au/blank_ad.gif
//.*movielink.com/media/imagelinks/MF.sponsor.* //www.taz.net.au/blank_ad.gif
//.*mrshowbiz.com/ad/.* //www.taz.net.au/blank_ad.gif
//.*msn.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*mudconnect.com/ads/ //www.taz.net.au/blank_ad.gif
//.*mydesktop.com/img/ads/.* //www.taz.net.au/blank_ad.gif
//.*netscape.com.au/ads/images/.* //www.taz.net.au/blank_ad.gif
//.*netscape.com.au/inserts/images/.* //www.taz.net.au/blank_ad.gif
//.*netscape.com/ads/images/.* //www.taz.net.au/blank_ad.gif
//.*netscape.com/inserts/images/.* //www.taz.net.au/blank_ad.gif
//.*news.com/Banners/Images/.* //www.taz.net.au/blank_ad.gif
//.*riddler.com/Commonwealth/bin/statdeploy.* //www.taz.net.au/blank_ad.gif
//.*safe-audit.com/exposure.cfm.* //www.taz.net.au/blank_ad.gif
//.*shareware.com/Banners/Images/.* //www.taz.net.au/blank_ad.gif
//.*sjmercury.com/advert/logos/.* //www.taz.net.au/blank_ad.gif
//.*smh.com.au/adproof.* //www.taz.net.au/blank_ad.gif
//.*techweb.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*tqn.com/zadz/.* //www.taz.net.au/blank_ad.gif
//.*tripod.com/ads/.* //www.taz.net.au/blank_ad.gif
//.*tucows.wire.net.au/images/adds/.* //www.taz.net.au/blank_ad.gif
//.*webreview.com/universal/graphics/ads/.* //www.taz.net.au/blank_ad.gif
//.*windows95.com/gifs/ads/.* //www.taz.net.au/blank_ad.gif
//.*winntmag.com/images/.* //www.taz.net.au/blank_ad.gif
//.*winntmag.com/titlebar/titlebar.stm //www.taz.net.au/blank_ad.gif
//.*wire.net.au/images/adverts/.* //www.taz.net.au/blank_ad.gif
//.*wired.com/advertising/.* //www.taz.net.au/blank_ad.gif
//.*wisewire.com/ClickAd.emc.* //www.taz.net.au/blank_ad.gif
//.*wisewire.com/SKB/.* //www.taz.net.au/blank_ad.gif
//.*worldvillage.com/adds/banners/ //www.taz.net.au/blank_ad.gif
//.*yahoo.com/adv/.* //www.taz.net.au/blank_ad.gif
//.*zdnet.com/adverts/.* //www.taz.net.au/blank_ad.gif
//204.123.2.101/ads/.* //www.taz.net.au/blank_ad.gif
//images.yahoo.com/promotions/.* //www.taz.net.au/blank_ad.gif
You can download redir here.
WebFilter is a patch for CERN httpd's proxy function which can also block out advertising. In some ways it is more capable than my method as it rewrites the text within a page before passing it on to the client web browser. A great idea, but IMO unacceptably slow for anything but a small, lightly used proxy. This idea really needs a non-forking server on a fast machine
OreO from the OSF is a general purpose filtering/transformation model for proxy servers.
V6 from Inria is another general purpose proxy with filtering capabilities based on WebFilter.
It may be possible to use or modify the proxying module in apache to do the same thing
Junkbusters have a program which can block advertising banners and cookies too. Their FAQ is good reading. Junkbusters also have a lot of information about blocking spam.
This script and the techniques illustrated here can be used for other purposes, as the Netscape example above shows.
Theoretically, it is possible for an ISP to use the techniques outlined here to redirect advertising banners and URLs to a local CGI script which then redirects to an advertising site which generates income for the ISP. While this may be an ethically questionable thing to do, it is a good example of subverting the net for your own purposes.
makemap
needs the -f
option to stop it from
converting all input to lowercase, and suggesting some improvements to my
script.