\documentclass[a4paper,10pt]{article} %opening \title{RDF and the Semantic Web} \author{Jimmy O'Regan} \begin{document} \maketitle \begin{abstract} A brief introduction to RDF, the Semantic Web, FOAF, and DOAP. \end{abstract} \section{Introduction} RDF is a framework for defining metadata; data that describes data. It was developed by the W3C, based on work by Ramanathan V. Guha, and was originally used in Netscape Navigator 4.5's Smart Browsing (``What's related?'') feature, and by Open Directory. RDF followed from work Guha had done earlier, both on the Cyc project, and on Apple's Hotsauce project. RDF is currently being used in several areas: MusicBrainz uses it to provide playlist information about albums and songs, as a replacement for formats such as WinAmp's M3U files, or Microsoft's ASX files; Mozilla uses it for several purposes, e.g. recording the details of downloaded files (sample), remembering which action to perform for each MIME type, and so on. RDF is also used in some variants of RSS to provide details about syndicated articles. RDF is a key component of the Semantic Web. In a May 2001 article in Scientific American, Tim Berners-Lee et al. described this concept: \begin{quotation} The Semantic Web is an extension of the current web where information is given well-defined meaning, better enabling computers and people to work in cooperation. \end{quotation} The Semantic Web is the next ``big deal'' for the Web: the W3C is developing several technologies, based around RDF, which provide a machine-readable way of representing metadata. Various schema are already in existence: RSS 1.0, for summarising the contents of a website (though RSS is mostly used by news sites to provide a ``feed'' of the news they provide); OWL, which provides a basis for creating new schema while providing compatibility with similar schema; FOAF, which provides a way of describing people and the relationships between them. There are several others as well. Edd Dumbill has even proposed that file metadata in GNOME be represented in RDF. \section{A brief history of RDF} Cyc is a project to create an artificial intelligence with basic ``common sense''. It has a large database containing definitions such as: a tree is a kind of plant, a sycamore is a kind of tree, and so on. Cyc is then able to deduce from these definitions that a sycamore is a plant. Though work continues on a commercial version of Cyc, an open source version, OpenCyc, is also available. Hotsauce was a browser plug-in created by Apple in 1996 that allowed users to use 3D navigation on a website that included an MCF definition. MCF, Meta Content Framework, was a text-based definition that used an RFC822-style format to describe a site. Following the examples I found on the net, I've created an example of my own that describes a simple site layout. When Guha moved to Netscape, he met with Tim Bray, co-author of XML 1.0, and the two worked on creating an XML-based version of MCF, submitted to the W3C in May 1997. (I've created a simple example of this too). With the addition of name spaces to XML, RDF began to take its current form. RDF uses XML name spaces to extend its vocabulary using RDF schema, though it's worth noting that although XML is the most common container format for RDF, other formats are in use. The W3C uses another format called N3 (Notation3), which bears greater resemblance to LISP-style languages, such as CycL and KIF. RDF schema separate the RDF metadata - definitions of how new terms relate to each other - from the ``normal'' metadata. Instead of defining the relationships among items, such as ``A is a child of B, and is a kind of C'', as was done in MCF and MCF-XML, these are defined in separate schema, which may be referred to using XML name spaces and reused (though it is still possible to include schema within the document). \section{RDF Statements} Anyone interested in learning about RDF would do well to read the W3C's RDF Primer and/or ``A no-nonsense guide to Semantic Web specs for XML people'', but I'll give my best attempt to explain the concepts. Each RDF statement is called a ``triple'', meaning it consists of three parts: subject, predicate, and object; the subject is either an RDF URI, or a blank node (I haven't seen a good explanation why these nodes are ``blank''\footnotemark, so I'll just refer to them as nodes). So, rephrasing the sentence ``Linux Gazette is the name of the website at http://linuxgazette.net'' looks like this: \footnotetext{ After this article was published, I received this e-mail from Frank Manola: I couldn't help commenting on your statement: ``Each RDF statement is called a ``triple'', meaning it consists of three parts: subject, predicate, and object; the subject is either an RDF URI, or a blank node (I haven't seen a good explanation why these nodes are ``blank'', so I'll just refer to them as nodes). '' I guess the Primer didn't make the connection explicit enough. Under Figure 6 in the RDF Primer, the text says ``Figure 6, which is a perfectly good RDF graph, uses a node without a URIref to stand for the concept of ``John Smith's address''. This blank node...'' The intent was to indicate that a blank node was one without a URIref. In the figure, the blank node is an ellipse with no label, i.e., ``blank''. The term ``nodes'' without qualification refers to all those parts of graphs that aren't arcs, including URIrefs, literals, and blank nodes (see the discussion of the graph model in Section 2.2 of the Primer, or Section 3.1 of RDF Concepts and Abstract Syntax). } \textbf{http://linuxgazette.net} has the property \textbf{name} with the value \textbf{``Linux Gazette''} Predicates may only be RDF URIs - references to a schema definition of the property - while objects may be URIs, blank nodes, or literals. We can then express the previous triple as an N-Triple, which is the simplest valid form of RDF, like this: \begin{verbatim} # Note that each URI most be enclosed, and each triple must end with a '.' "Linux Gazette". \end{verbatim} In XML, the general style is: \begin{verbatim} Object \end{verbatim} so, using XML QNames, which are valid URIs, our example becomes: \begin{verbatim} Linux Gazette \end{verbatim} RDF allows multiple properties to be abbreviated by including them within the same set of tag, so: \begin{verbatim} Linux Gazette LG \end{verbatim} is the same as: \begin{verbatim} Linux Gazette LG \end{verbatim} \begin{verbatim} Ben Okopnik \end{verbatim} These nodes can be given identifiers, and referred to by identifier, for clarity: \begin{verbatim} Ben Okopnik \end{verbatim} \section{FOAF} FOAF stands for ``Friend of a Friend''. FOAF was designed as a way to represent information about people online, and the relationships between them. E-mail addresses are used as identifiers, though given how much spam we all seem to receive these days, it is also possible to use a sha1sum of an e-mail address, or both. FOAF is being used by sites such as plink (``People Link'') where people can meet each other, and sites like LiveJournal generate FOAF so that users can have their relationships automatically described. There is also SharedID, which is attempting to build a Single Sign-On Service based on FOAF. Rather than talk about it in nauseating detail, I'll show a simple example, which should be relatively self-explanatory. (This is based on an example generated by FOAF-a-matic). \begin{verbatim} Jimmy O'Regan Mr. Jimmy O'Regan jimregan 9642c26da203ef143f884488d49194eb0747547c Mark Hogan 7dbf56320b204be2e2bee161abed3ffc5825b590 \end{verbatim} Most elements in FOAF can be added as many times as you like - many of us have several e-mail addresses, homepages, etc. Anything you add to your own $<$foaf:Person$>$ definition, you can add to any other node within a $<$foaf:knows$>$ node. \begin{verbatim} Mark Hogan 7dbf56320b204be2e2bee161abed3ffc5825b590 Sprogzilla 1997-06-29 \end{verbatim} \section{More Detail} There are quite a lot of things you can describe using FOAF, from the useful to the amusing (such as the $<$foaf:dnaChecksum$>$ tag). Let's add a few of the more useful items, such as those generated by LiveJournal: \begin{verbatim} 1979-07-31 LiveJournal.com Profile Full LiveJournal.com profile, including information such as interests and bio. \end{verbatim} The $<$dc:title$>$ and $<$dc:description$>$ tags are from the Dublin Core schema, so we would have to change our namespace definitions to add the url: \begin{verbatim} \end{verbatim} There are also several ways of identifying various chat accounts you have online: \begin{verbatim} jimregan@jabber.org 113804615 \end{verbatim} and many other, miscellaneous details. Now, that's all well and good, but what about the less useful stuff? FOAF has tags to add silly things like Myers-Briggs personality types, geekcode blocks, and .plan information: \begin{verbatim} ENTP GCS/IT/MU/TW/O d-(+)>--- s: a--(-) C++(+++)>++++$ UBLAC++(on)>++++$ P++(+++)>++++ L+++>++++$ E+(+++) W+++>$ N+ o K++ w(++)>-- O- M !V PS+(+++) PE(--) Y+>++ PGP-(+)>+++ t+() !5 X+ !R tv+ b++(+++)>++++$ DI++++ D++ G e h r- y--**(++++)>+++++ Get a better job \end{verbatim} \section{Other Schemata} FOAF contains a lot of useful and amusing stuff, but there are many other schemata waiting to be used. On the amusing side: \begin{verbatim} Leo Jimmy Doyle a0610d4a3086354b9ef1daf50d24de232115c965 \end{verbatim} There are also more useful external schemata - geo, to describe the location of a place; and srw to describe the languages you speak, read, write, and master. \begin{verbatim} 52.6796 -7.8221 en fr ga \end{verbatim} One of the cool things you can do with geographic locations is to run FoafGeoGraph on your FOAF file. This looks for geographic locations for people, or for links to their FOAF files, from which it generates input for XPlanet which can be used to show your position relative to those of your friends. Nifty, eh? You can also refer to other documents (though this is part of RDF). For example, there's a schema that represents iCalendar data in RDF. Using this, I can provide my work timetable and some birthdays from my foaf file (work.cal.rdf.txt and birthdays.cal.rdf.txt, generated with this Perl script, which I've lost the link for): \begin{verbatim} \end{verbatim} I've already mentioned that it's possible to depict chat accounts with services such as Jabber, and MSN. It's also possible to represent other accounts. Chat accounts which don't have a specific tag available can still be represented, using the $<$foaf:OnlineChatAccount$>$ tag, while other accounts use the $<$foaf:OnlineAccount$>$ tag. \begin{verbatim} jimregan jimregan \end{verbatim} \subsection{Specifying a relationship} FOAF was originally designed to described the way in which people were related to each other, though this seems to have become unworkable. It's still possible to convey this information though: \begin{verbatim} Mark Hogan 7dbf56320b204be2e2bee161abed3ffc5825b590 Sprogzilla 1997-06-29 \end{verbatim} \subsection{Web of trust} Identity theft is becoming more and more of an issue on the 'net. To protect against this, there is a schema available which describes various aspects of a PGP signature, and to allow a FOAF file to be digitally signed. (For more information on signing a FOAF document, see Edd Dumbill's page). \begin{verbatim} 773730F8 1024 D675 279B 24AC BDC3 BAE9 BB7A 666C 30CA 7737 30F8 \end{verbatim} \subsection{Co-Depiction} Co-depiction is a hot topic in FOAF circles. The logical extension to saying you know someone is to prove it with a photo. Let's expand the first example to show a co-depiction of Mark and me: \begin{verbatim} Mark Hogan 7dbf56320b204be2e2bee161abed3ffc5825b590 \end{verbatim} Nothing special, just the same $<$foaf:depiction$>$ tag in two or more $<$foaf:Person$>$ tags, which makes it easy for a computer to find where two people are co-depicted. The next step in co-depiction is going to be the addition of SVG outlines, to show who is who in a photo, though unfortunately, no one has yet come up with a way of referring to the SVG data in a computer readable way. (SVG, as an XML-based format, is capable of including RDF metadata without difficulty). Another problem is that SVG hasn't quite caught on yet, though the latest version of KDE supports it, and future versions of Mozilla should support it. As it is, though, it can be useful to show humans who's who. Take this picture, which shows my son, his friend Adam, and me in the background. In the future, it will be possible to use an SVG image like this - which contains an invisible outline, with some additional metadata, to give machine readable information, similar to HTML's image maps. At the moment, we can use an image like this, which contains the same outlines, but is filled with separate colours for each person depicted, and simply say that Mark is represented by the green area, Adam by the red, and me by the blue. (I use the svgdisplay program that comes with KSVG; those without an SVG viewer can look at this PNG to see what I'm talking about). \section{DOAP} DOAP stands for "Description of a Project". DOAP is an RDF vocabulary, influenced by FOAF, which was created by Edd Dumbill to provide details about an open source software project. At the time of writing, DOAP, as a project, is one week old, and already has impressive momentum. DOAP is designed as a replacement for several different formats for project description, such as that used by Freshmeat or the GNU Free Software Directory, which complements FOAF. Here's an example which describes LG, generated using DOAP-a-matic (text version): \begin{verbatim} Linux Gazette LG 1995-07 Linux Gazette is a monthly Linux webzine, dedicated to providing a free magazine for the Linux community and making Linux just a little more fun! Linux Gazette is a monthly Linux webzine. HTML Linux Ben Okopnik Ben 30073b332dac6fc812dcdd806ac1e9715ceac46a \end{verbatim} DOAP-a-matic doesn't have support for the OPL, the licence used by LG, but it's simple to define a new one (text version): \begin{verbatim} OPL Open Publication License \end{verbatim} \section{Conclusion} RDF and the Semantic Web are exciting topics, and I hope I've piqued someone's interest with this article. Anyone wishing to find out more may simply Google for RDF, or try the list of bookmarks I accumulated while reading about RDF. \end{document}