Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

Scutter

From FOAF Wiki

Jump to: navigation, search

In the context of RDFWeb and FOAF, a scutter is simply a computer program that loads, parses, interprets and acts upon the contents of a Web of interconnected RDF/XML documents. In this sense it is just a Semantic Web variant on the old theme of distributed Web indexing, sometimes called a 'harvester', 'spider', or 'robot'. The links between RDF documents are usually, but not necessarily, expressed using RDF's 'rdfs:seeAlso' property.

As of 2009, the most up-to-date and LinkedData-friendly scutter is Slug, see http://code.google.com/p/slug-semweb-crawler/


We call RDFWeb/FOAF indexers 'scutters' in tribute to the robots from the UK TV series Red Dwarf. They are also called this because, metaphorically, they *scutter* around the Web looking for stuff. "scutter" as a verb means more or less the same as "scurry" or "scamper" - to move in or as if in a brisk pace, to move around in an agitated, confused, or fluttering manner. I point this out because I've heard that the use of such metaphors (eg 'harvester') can be confusing for people who have English as their second language. See RedDwarf page for more background.

So, Scutters are simple computer programs that consume and act upon RDF documents discovered in the Web. They typically depend upon other programs, such as RDF parsers and storage systems, to do anything interesting. Given an RDF toolkit that provides parsing and storage facilities, it is a fairly modest task to write a Scutter.

Scutters use RDF's flexible, extensible approach to Web data structures, so they do not need to have any specific knowledge of particular XML markup or RDF vocabularies. The only thing that a Scutter really needs to know is how one RDF file might mention another RDF file elsewhere in the Web. RDF's mechanism for this is the 'rdfs:seeAlso' property, and this provides the basis for a Scutter's ability to treat RDF documents as an interconnected Web. In addition to this, Scutters that care about data merging may also need to know which RDF properties are uniquely identifying. For example, the property 'foaf:mbox' uniquely picks out individuals that have this property; this is also true of 'foaf:homepage'. Rather than being FOAF specific, a good Scutter will keep a list of such properties, to allow it to merge together scattered fragments of RDF that mention the same entities. This is sometimes called 'smushing', but might also be referred to as identity-based reasoning.

The ScutterSpec page has some more information about scutters. As does ScutterPlan and ScutterVocab.