
So in the process of developing my scholar system that uses Google scholar to get citation counts and automatically insert them into (and color) lists of scientific references, I managed to trip the Google robot blocker. Of course I'd love to be using their API, but Google has
not opened Google Scholar as part of their API yet. I've been blogging about this for a few days - I think it would be a hugely popular move. The reason I got myself blocked yesterday was that I finally worked out some regex to semi-reliably strip the titles from random scientific references. As I was debugging the colorization process I was hitting Google Scholar a bit too frequently and then the queries starting failing. Got the above image in my browser, which very kindly allowed be to re-access Google via the web once I'd typed in the captcha.
Of course by ruby script was still blocked so I gave up work for the evening, but it was working again this morning - thanks Google - and so I immediately implemented a caching mechanism so that I don't hit Google Scholar each time I go through a debug cycle. Of course this means that releasing what I've created as a service would be problematic as it wouldn't scale. This is another reason why it would be beneficial for Google to release an API for Google scholar and allow systems to access through authenticated keys to distinguish them from robots. And then the services that I and others have built could be available to lots of other academics. Stay tuned for my next blog post in which I'll post some images from my new service ... (not really such a cliff-hanger - gotta have lunch first :-)