BibSonomy Blog: November 2007

Thursday, November 29, 2007

Feature of the Week: Character encoding of imported files

For this weeks feature of the week I'll first briefly discuss what a "character encoding" is and afterwards explain, why it is important during BibTeX import.

On a very low level, computers only understand zeros and ones. Hence, a mechanism is needed to encode symbols like letters and numbers as sequences of zeros and ones. A "table" which assigns to each symbol its corresponding zero-one sequence is called a character encoding (or character set). This table allows a computer to interprete the data in a file and show the correct symbol on the screen (or printer). Unfortunately, several such character encodings exist. Depending on the chosen character encoding, the same sequence of ones and zeros might stand for different symbols. To correctly display a piece of data, the computer must know its interpretation - its character encoding.

When uploading a BibTeX (or EndNote) file to BibSonomy, we face the same problem: we have to interprete the file with the correct character encoding. Typically, it's not possible to guess it (it's just an interpretation of the data - each interpretation could possibly be correct) so there is an option on the post_bibtex page which allows you to specify the character encoding of the file to upload. A click on the options link reveals a dropdown list which contains a choice of some typical character encodings. The default is "UTF-8" which is nowadays more and more common. However, older files might have a different encoding like "ISO-8859-1" (also known as "latin1"). If you're unsure about your data, UTF-8 is a good choice. If this gives you errors during import or strange looking characters afterwards, try another encoding. In Europe "ISO-8859-1" is very common, too.

Tuesday, November 27, 2007

Search functionality is available again after some difficulties

During a server update on the afternoon of November 27th, 2007, we were confronted with a technical difficulty which affected temporarily the search feature of BibSonomy. The consequence was that it was not possible to search BibTex or Bookmark entries, because the search tables in the database were corrupted. We are sorry for this and apologize for any inconvenience this incident has caused.

In the meantime, we have located and eliminated the problem, and we are now happy to offer you the complete BibSonomy functionality which you are used to.

Best,
Dominik

Thursday, November 15, 2007

Feature of the Week

As explained in the last feature of the week, BibSonomy allows users to structure the content via SUPERTAG <- SUBTAG relations. The built tag concepts are available for searching and navigation through our folksonomy system. As seen in the figure above it is ease in handling, whereas each step is symbolized with a circle. Only choose “concepts” (step 1) as search option and type a tag which your are interested in (step 2). In the last step (step 3), you get resources and a visual presentation of your concept as hierarchy. Miranda

Wednesday, November 14, 2007

Feature of the week: Retrieve resources by disjunction of tags

A very common way to browse through your own or other people's repository on BibSonomy is via one or more tags, e.g.

http://www.bibsonomy.org/tag/semantic+web.

Hereby, the tag-based retrieval is done in a conjunctive manner, i.e., the result of this query will comprise all bookmarks and publications tagged with semantic AND web. We are often asked if we offer any other possibiliby of combined tag queries, e.g. by disjunction - one might e.g. be interested in all resources tagged with semanticweb OR ontologies.

This behaviour is not accessible via a specific URL scheme, but can be achieved by invoking an old BibSonomy buddy - namely concepts! As you will know, BibSonomy allows you to define relations between tags in the form:

SUPERTAG <- SUBTAG

(see also http://www.bibsonomy.org/relations). A supertag along with all its subtags is denoted a concept in BibSonomy, which can be used to retrieve resources like this:

http://www.bibsonomy.org/concept/tag/ontology

The characteristics of this retrieval method is now that all resources are returned which are tagged with ontology OR one of its subtags. This constitutes, in fact, a retrieval of resources by a disjunction of tags. We are aware that this has some limitations, as a concept has to be defined before this type of query is possible - but facing a tradeoff between efficient query processing and freedom of query formulation has led us to this decision, with the ultimate goal to keep our service highly responsive for all of you as our users.

Friday, November 9, 2007

Feature of the week: A tagcloud for the ISWC + ASWC 2007

Next week, the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference are hosted in Busan, Korea. About 120 conference, workshop and doctoral papers will be presented and discussed. With the help of a RDF dump, publication metadata and hyperlinks are now available in BibSonomy.

The system contains all accepted papers, together with the keywords (tags) that authors have associated with their papers or that show up in the paper titles. To help conference participants finding interesting works, a web-frontend has been created which shows a tag cloud of the most important keywords. The color of each tag indicates the track to which most abstracts annotated with that tag belong to. Clicking on a tag (keyword) will retrieve from BibSonomy the abstracts that have been tagged with it.

While attending the conference, participants can further collect, annotate and share publications using BibSonomy. The "cool" stuff is presented when clicking on "See you what your collegues find cool". A specific search showing all publications of a searched author completes the retrieval facilities of BibSonomy.

The idea to enable publication sharing in conferences was started at the Statphys23 conference in 2007 under the umbrella of the Tagora project. The ISWC + ASWC tagcloud has been realized with support of Nepomuk.

Given the necessary BibTeX entries to store publication abstracts, metadata and associated keywords in BibSonomy, we can provide BibSonomy web front-ends presenting a conference's tag cloud and interests (cool stuff). With this initiative we hope to enhance and round up discussions and information sharing among research communities.

Beate

Friday, November 2, 2007

Detecting duplicates in BibSonomy

One feature we added recently was the detection of duplicate references in a user's publication list. During the design of the system we had a discussion how to find links between references of different users if they are not identical. Therefore we had to solve two problems: First we have to find the duplicate entries and second it has to be fast as nearly all pages check for duplicate entries to provide a nice browsing.

The solution we came up were hash keys. The system is able to handle four different hash keys. Currently we use two of them, the intrahash and the interhash.

The intrahash avoids duplicates in the users library and tries to find only entries mostly identically. To compute this hash we use the title, type, author, editor, year,journal, booktitle, volume,number fields with only minor normalization. This hash also ensures that a user can only have a certain publication once in his library but the entry has to by nearly 100% identically.

The interhash key was designed to find as many similar publications as possible to support browsing within the system and to point users to other users with similar interests. Therefore the hash key is based only on title, year and author/editor information heavily normalized. In this way we can identify also entries which rely on different spelling of e.g. author names.

The new duplicate detection feature bases the duplicate detection on the interhash to detect duplicates in the library of a users. As the intrahash key reacts on nearly every change in an entry it allows to store also very similar entries with e.g. only a small change in the booktitle. The interhash key is able to detect those similar entries and list all publications of a user which appears at least twice within the users publication list. Checking this list you can remove unwanted duplicates and cleanup your your publications list.

We hope this feature is helpful. Have fun

Andreas