A quick hack —
goog-love.pl – find out where your site’s google juice comes from
This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.
The output is in plain text and a chunk of HTML.
usage:
goog-love.pl sitehost google-api-key < access.log > out.html
e.g.
cat /var/www/logs/taint.org.* | goog-love.pl \
taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html
NOTE: this script requires the SOAP::Lite
module be installed. Install
it using apt-get install libsoap-lite-perl
or cpan SOAP::Lite
.
It also requires a Google API key.
For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)
- #1 for kriskat225: http://taint.org/2006/01/20/220239a.html
- #1 for kriskat224: http://taint.org/
- #1 for mailman rss: http://taint.org/mmrss/index.html
- #1 for ray is naked: http://taint.org/2005/05/27/195421a.html
- #1 for beardy justin: http://taint.org/2005/09/10/002323a.html
- #1 for threadless rss: http://taint.org/2005/05/25/060857a.html
- #1 for louis fitzgerald: http://taint.org/2005/05/12/020118a.html
- #1 for download JusteTune: http://taint.org/index.php?tag=apple
- #1 for mobile repair delhi: http://taint.org/2005/11/11/032651a.html
- #1 for site:taint.org mythtv: http://taint.org/index.php?tag=hdtv
- #1 for “Google Map” IDS rulesets: http://taint.org/2005/09/
- #1 for spam email “prank a friend”: http://taint.org/2004/11/
- #1 for site:taint.org mythtv freevo: http://taint.org/index.php?tag=mythtv
- #1 for world map desktop background: http://taint.org/xplanet/
- #1 for kate thornton + Samuel L jackson: http://taint.org/2003/12/10/185721a.html
- #1 for when did chris horn leave iona technologies?: http://taint.org/2003/05/
- #2 for natkat224: http://taint.org/
- #2 for itms linux: http://taint.org/2005/09/20/022107a.html
- #2 for msn IDs hacking software: http://taint.org/index.php?tag=hacking
- #3 for gmail spam filter: http://taint.org/2004/04/15/033025a.html
- #3 for live world map on desktop: http://taint.org/xplanet/
- #4 for moin mozex: http://taint.org/2004/10/08/081409a.html
- #4 for editable p45: http://taint.org/2005/01/27/025238a.html
- #4 for urban dead exploits: http://taint.org/index.php?tag=games
- #4 for gmail spam filtering: http://taint.org/2004/04/15/033025a.html
- #4 for world map desktop wallpaper: http://taint.org/xplanet/
- #5 for cdwow.ie: http://taint.org/2003/12/04/185038a.html
- #5 for life hacking: http://taint.org/2005/10/17/210751a.html
- #5 for Adelphi Charter: http://taint.org/index.php?tag=politics
- #6 for irish SME: http://taint.org/2005/06/23/212513a.html
- #6 for urbandead: http://taint.org/index.php?tag=hacks
- #6 for SKY NEWS IRELAND: http://taint.org/2004/05/12/205717a.html
- #7 for daniel cuthbert: http://taint.org/2005/10/12/205836a.html
- #7 for SAMUEL L. JACKSON QUOTES: http://taint.org/2003/12/10/185721a.html
- #7 for cool background pictures: http://taint.org/xplanet/
- #8 for CDWOW: http://taint.org/2003/12/04/185038a.html
- #8 for urban dead: http://taint.org/2005/10/29/224403a.html
- #8 for korea porn: http://taint.org/2003/07/12/031422a.html
- #8 for BBC port 8998: http://taint.org/2003/08/
- #8 for iftop documentation wrt: http://taint.org/index.php?tag=freevo
- #8 for php mail injection spam: http://taint.org/2005/12/08/202248a.html
- #8 for fake open source software : http://taint.org/index.php?tag=open-source
- #9 for faad symbian: http://taint.org/index.php?tag=apple
- #9 for sky news ireland: http://taint.org/2004/05/12/205717a.html
- #9 for telemarketing counter speech: http://taint.org/2002/11/12/130851a.html
- #10 for “Scratch Heads Over”: http://taint.org/2003/07/12/031422a.html
- #10 for web scraper linux console: http://taint.org/2004/06/05/023726a.html
Download here (5 KiB perl script).
Notes:
if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.
Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)