SURNAME = '[A-Z][a-z]{1,}'
INITIALS = '((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*)'
TITLE = '(([A-Za-z:,\r\n]{2,}\s?){3,})'
REGEX = /([^e][^d][^s][^\.]\s|\d+\.?\s|^)(#{SURNAME},?#{INITIALS})(\s?(,|and|&|,\s?and)?
\s?(#{SURNAME},?#{INITIALS}))*\s*(\(?\d\d\d\d\)?\.?)?\s*("|“)?(#{TITLE})\.?("|”)?/
Now I am sure that this can be improved upon, but with a little web interface I have cooked up I can take the following:
1. Erickson, T. & Kellogg, W. A. “Social Translucence: An Approach to Designing Systems that Mesh with Social Processes.” In Transactions on Computer-Human Interaction. Vol. 7, No. 1, pp 59-83. New York: ACM Press, 2000.
2. Erickson, T. & Kellogg, W. A. “Knowledge Communities: Online Environments for Supporting Knowledge Management and its Social Context” Beyond Knowledge Management: Sharing Expertise. (eds. M. Ackerman, V. Pipek, and V. Wulf). Cambridge, MA, MIT Press, in press, 2001.
3. Erickson, T., Smith, D.N. Erickson, T., Smith, D.N., Kellogg, W. A., Laff, M. R., Richards, J. T., and Bradner, E. (1999). “Socially translucent systems: Social proxies, persistent conversation, and the design of Babble.” Human Factors in Computing Systems: The Proceedings of CHI ‘99, ACM Press.
4. Goffman, E. Behavior in Public Places: Notes on the Social Organization of Gatherings. New York: The Free Press, 1963.
5. Heath, C. and Luff, P. Technology in Action. Cambridge: Cambridge University Press, 2000.
6. Smith, C. W. Auctions: The Social Construction of Value. New York: Free Press, 1989
7. Whyte, W. H., City: Return to the Center. New York: Doubleday, 1988.
1. Erickson, T. & Kellogg, W. A. “Social Translucence: An Approach to Designing Systems that Mesh with Social Processes (Cited by 78).” In Transactions on Computer-Human Interaction. Vol. 7, No. 1, pp 59-83. New York: ACM Press, 2000.
2. Erickson, T. & Kellogg, W. A. “Knowledge Communities: Online Environments for Supporting Knowledge Management and its Social Context (Cited by 52)” Beyond Knowledge Management: Sharing Expertise. (eds. M. Ackerman, V. Pipek, and V. Wulf). Cambridge, MA, MIT Press, in press, 2001.
3. Erickson, T., Smith, D.N. Erickson, T., Smith, D.N., Kellogg, W. A., Laff, M. R., Richards, J. T., and Bradner, E. (1999). “Socially translucent systems: Social proxies, persistent conversation, and the design of Babble (Cited by 284).” Human Factors in Computing Systems: The Proceedings of CHI ‘99, ACM Press.
4. Goffman, E. Behavior in Public Places: Notes on the Social Organization of Gatherings (Cited by 822). New York: The Free Press, 1963.
5. Heath, C. and Luff, P. Technology in Action (Cited by 408). Cambridge: Cambridge University Press, 2000.
6. Smith, C. W. Auctions: The Social Construction of Value (Cited by 210). New York: Free Press, 1989
7. Whyte, W. H., City: Return to the Center (Cited by 14). New York: Doubleday, 1988.
Which I think is pretty damn useful. I'm getting about a 70% hit rate on other lists of references and I'm sure that can be improved. There are also changes that I might make to the color gradation. At the moment I'm just setting the red value from 0 to 255 based on number of citations, and everything with more than 255 citations doesn't get any redder. I'd like to set it up so that the color was normalised, so that the highest citation count in the references corresponds to red and all the gradations are in between, and ideally I'd like to slide between red and white instead of red and black and have the background color change rather than the text, but that's all icing on the cake really.
What I'd most like to see is this as a web service that everyone could use, and an ongoing group effort to improve the regex further and get as many title matches as possible. If interested please add your vote to the Google Scholar feature request.
6 comments:
FYI, the way Google folks track how many people are interested in an issue is by how many people have "starred" it. Annoyingly, you'll get an email message every time someone leaves another "me too" comment on the issue, but it pays to be counted.
Hi Sam,
I am doing a software engineering project for school, and I was wondering if I could use your regular expression as a base to developing it.
Thanks
Raymond
Ray,
I'm doing a similar project in LIS. If you're interested in exchanging ideas and experiences, please contact me (email, see profile).
kb
Hi Ray, please do use my regular expressions, and do post back any improvements you manage to make. I have put this project on a back burner since there doesn't seem much chance of Google releasing an official API. There is a Thomson ISI API for similar data that you can use if your school or organization is subscribed. Of course you might just be interested in the regular expressions. For me they were just part of a bigger project to grab citation data from web services and annotate bibliographies.
Would love to hear more about your project.
CHEERS> SAM
Hi, just wondering if this project has progressed any further. I've been waiting for a Google Scholar API for years and have seen a few attempts to mine the output and construct reference trees and such. Your script might help push those ideas forward. It looks great.
Hi Myq,
No further progression I'm afraid. It's been suggested to me that Google pays alot for their access to the citation services and is not in a position to provide an API; but who knows, maybe one day.
If you are in an academic setting there is a citation API available through somebody like ISI or Thomson or someone that can be used purely within the academic institution, but I didn't look into it any further ...
Post a Comment