As mentioned in Robert's last blog post we set up a scraping service which supports users working with citations by extracting automatically references from digital library or publisher websites. We use a very similar service in BibSonomy to support our users while posting a new reference. However, the service is independent from BibSonomy. Our main goal is to make the metadata of other websites easily accessible to every user who needs bibliographic metadata. Therefore we offer the extracted information in BibTeX format. Most tools allow to import BibTeX so it should be very easy for everyone to get the data into his own tool. The service is running under the following URL:
http://scraper.bibsonomy.org/
Currently we support more than 60 different websites (here the full list) and we are working on further extensions. In the near future we will make the source code of our scrapers publicly available under GPL and we hope that other people will find it useful and start to help us by implementing their own scrapers.
How does the service work?
In principle there are two ways to use the service. One uses a so
called bookmarklet and the other is simply based on the URL. If you
have a webpage of a supported site e.g. from ACM digital library the
following page:
Logsonomy - social information retrieval with logdata
then you can copy this URL into the form on the service homepage and the service will return you the extracted BibTeX information. As this is not a very convenient way to access the data we provide a ScrapePublication button. This button is a small piece of JavaScript and can be copied to the toolbar of the browser. By pressing this button while visiting a digital library webpage the URL will be automatically copied and sent to the scraping service and the metadata is extracted.
The service has three options which can be used to customize it and to make it useful for other systems. Obviously one parameter is the URL itself which is used by the bookmarklet, too. The next is the selection parameter which allows to send text to the service and the last parameter allows to change the output format from html to plain BibTeX. This last parameter makes integration with other systems very simple.
If needed we can provide the metadata in other formats as well but currently we support only BibTeX.
Popular Posts
-
A while ago we were asked on Twitter about a Twitter integration for BibSonomy (by the way follow @BibSonomyCrew on Twitter for the latest ...
-
Two important aspects of working with literature are the process of sharing it among your colleagues and the exchange of ideas and thoughts ...
-
It is vacation time and hence our features of the week are more sporadic but this week we want to give you a glimpse on current developmen...
-
Dear BibSonomy users, right in time for Christmas / Holidays we finished our work on BibSonomy Version 3.9....