One feature we added recently was the detection of duplicate references in a user's publication list. During the design of the system we had a discussion how to find links between references of different users if they are not identical. Therefore we had to solve two problems: First we have to find the duplicate entries and second it has to be fast as nearly all pages check for duplicate entries to provide a nice browsing.
The solution we came up were hash keys. The system is able to handle four different hash keys. Currently we use two of them, the intrahash and the interhash.
The intrahash avoids duplicates in the users library and tries to find only entries mostly identically. To compute this hash we use the title, type, author, editor, year,journal, booktitle, volume,number fields with only minor normalization. This hash also ensures that a user can only have a certain publication once in his library but the entry has to by nearly 100% identically.
The interhash key was designed to find as many similar publications as possible to support browsing within the system and to point users to other users with similar interests. Therefore the hash key is based only on title, year and author/editor information heavily normalized. In this way we can identify also entries which rely on different spelling of e.g. author names.
The new duplicate detection feature bases the duplicate detection on the interhash to detect duplicates in the library of a users. As the intrahash key reacts on nearly every change in an entry it allows to store also very similar entries with e.g. only a small change in the booktitle. The interhash key is able to detect those similar entries and list all publications of a user which appears at least twice within the users publication list. Checking this list you can remove unwanted duplicates and cleanup your your publications list.
We hope this feature is helpful. Have fun
Andreas
Friday, November 2, 2007
Popular Posts
-
A while ago we were asked on Twitter about a Twitter integration for BibSonomy (by the way follow @BibSonomyCrew on Twitter for the latest ...
-
Two important aspects of working with literature are the process of sharing it among your colleagues and the exchange of ideas and thoughts ...
-
It is vacation time and hence our features of the week are more sporadic but this week we want to give you a glimpse on current developmen...
-
Dear BibSonomy users, right in time for Christmas / Holidays we finished our work on BibSonomy Version 3.9....