Introduction to Linked Data

In 2006 Berners-Lee wrote an influential note suggesting principles for the publication of data on the semantic web. The original text can be found at this web address:

Since then the volume of data has grown from around 2 billion triples in 2007 to over 30 billion in 2011, interconnected by over 500 million RDF links, the main purpose of which is to establish chains of URIs that refer to the same individuals. Through such links, published datasets are combined into a vast body of data known as a "cloud".

RDF graph

Figure 5: Linked data cloud (2007)
Citation: Linking Open Data cloud diagram (2007), by Richard Cyganiak and Anja Jentzsch.
License: CC-BY-SA

Figure 5 shows a diagram of the linked data cloud for 2007, in which nodes represent published datasets, and links represent sets of RDF triples through which the URIs in one dataset are paired with their counterparts in another dataset. Thus the link from DBpedia to MusicBrainz means that DBpedia includes not only RDF triples that give informaton about the world, but also triples that link some DBpedia names to their synonyms in MusicBrainz. We have seen examples of such statements in the last section, including the following triple which links the two names for the Beatles.


Note that since the "sameAs" relation is transitive and commutative, two statements of the form "X sameAs Y" and "Y sameAs Z" (or equivalently "Z sameAs Y") can be combined to infer "X sameAs Z"; in this way, lists of synonymous names can be derived from the cloud.

Principles[edit | edit source]

In his 2006 note, Berners-Lee set out four simple principles for publishing data on the web. These are best seen as rules of best practice rather than rules that must be obeyed: the idea is that the more people follow these principles, the more their data will be usable by others.

In brief, the principles are as follows:

  1. Use URIs to identify things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, RDFS, SPARQL).
  4. Include links to other URIs, so that they can discover more things.

The rationale for these principles is probably obvious. By using URIs to identify individuals, classes, and properties, we obtain names that perform a double duty: as well as referring to the relevant thing, they give us a location on the web where we may look for information about that thing. Other naming schemes accomplish only the first of these duties. However, to obtain benefit from a name that also serves as a web address, the URI should not be a broken link. It should point to relevant information, encoded in one of the expected formats. This benefit will be enhanced further if the information includes URIs that point to other locations on the web from which additional relevant information might be recovered.

Rating published datasets[edit | edit source]

In 2010 Berners-Lee extended the note referenced above to propose a system for rating datasets, based on the five-star rating system used for hotels. Closely related to the principles just listed, the system is as follows:

  • One-star (*): The data is available on the web with an open license.
  • Two-star (**): The data is structured and machine-readable.
  • Three-star (***): The data does not use a proprietary format.
  • Four-star (****): The data uses only open standards from W3C (RDF, SPARQL).
  • Five-star (*****): The data is linked to that of other data providers.

Note that every level here includes the previous levels: thus for instance three-star data must also be available on the web in machine-readable form.

Growth of linked data on the web[edit | edit source]

We have shown above a diagram of the linked data cloud for 2007 (Figure 5). For comparison, Figure 6 shows the corresponding diagram for 2014 (the last year for which we have data), showing the expansion that has taken place during these years. The picture is just considering dataset reachable by [[1]] crawler and today it is VERY partial.

RDF graph

Figure 6: Linked data cloud (2014)
Citation: Linking Open Data cloud diagram (2014), by Richard Cyganiak and Anja Jentzsch.
License: CC-BY-SA

The colours on this diagram provide a broad categorisation of the domains of the various datasets.

Examples[edit | edit source]

To explore the possibilities of linked data browsers and data mashups (which combine data from many sources), look at these examples of working websites.

Illustreets[edit | edit source]

Illustreets is a web application developed by Manuel Timita and Kateryna Koval, available at free of charge

This application is particularly useful when looking to buy or rent property anywhere in England, UK. illustreets puts deprivation, crime, education, transport, environment, and census data on an interactive, searchable map, helping you to compare between locations on the fly.

You can hover over the map to get useful information about any location, in real time. Then, as you click on a coloured area, you get full details about that particular neighbourhood.

Also, users can filter the neighbourhoods by a number of criteria. For example, house hunters can filter the results by the number of bedrooms and the property price. Unaffordable areas are then shaded out on the map and users can view detailed summaries of the areas where they can afford to live.

snapshot of illustreet

Figure: Illustreets screenshot
Citation: Manuel Timita and Kateryna Koval. 

Illustreets uses following datasets from available in

  • English Indices of Deprivation
  • Lower Layer Super Output Area (LSOA) boundaries
  • 2011 Census
  • Street level crime
  • National Public Transport Access Nodes (NaPTAN)
  • Independent school inspections and outcomes
  • Maintained school inspections and outcomes

snapshot of illustreet

Figure: Illustreets screenshot
Citation: Manuel Timita and Kateryna Koval. 

Some other sites[edit | edit source]

For further examples of sites using Linked Data, see the following.

BBC Music
The BBC has launched a music portal based on Linked Data at
The University of Leipzig has a community project providing street map information based on Linked Data, at
US government data
In 2009 the US and UK governments made commitments to open data. The US government data site is at
UK government data
Available at with over 8000 datasets published at the time of writing.

Exercise[edit | edit source]

Study the following RDF statements, expressed in the Turtle syntax, then attempt the exercises that follow.

@base <http://LinkedData.Center/home/about#> .
@prefix org: <> .
@prefix rdfs: <> .
@prefix foaf: <> .

vocab:ResearchProject rdfs:subClassOf foaf:Group .

vocab:consortiumMember rdfs:subPropertyOf foaf:member .

<enrico> a foaf:Person ;
        foaf:givenName "Enrico" ;
        foaf:familyName "Fagnoni" ;
        org:headOf <http://LinkedData.Center/> .

<http://LinkedData.Center/> a org:Organization ;
        rdfs:label "LinkedData.Center company" ;
        rdfs:comment "Linked Data as Service provider"@en,
         "fornitore di servizi di Linked Data"@it .
  1. Re-express the statements in NTriples (i.e. remove all prefixes and abbreviations to give full triples in absolute URIs).
  2. Add a resource representing yourself, attaching your name using the FOAF properties.
  3. Add a property to this resource asserting that you know Enrico (i.e. foaf:knows).