Semantic Technologies and Linked Data Foundations
We will describe a set of technologies that allows datasets to be published over the web, and queried effectively by applications. Compared with search engines such as Google and Yahoo, which are based on text-string matching, these technologies are "semantic". This means that information is represented not in a natural language like English or Spanish, but in a graph-based data model that facilitates extension, integration, inference and uniform querying. As a realistic application of semantic technologies, we consider the provision of a portal through which users can retrieve resources and information in the world of music. Consider for example the following tasks:
- Retrieve a performance of the Beethoven violin concerto by a Chinese orchestra
- Retrieve a photograph of the conductor of this performance
- List male British rock musicians married to Scandinavians
Attempts to answer such queries through text-based search are unreliable: we might equally retrieve a performance in which the soloist was Chinese, or a rock musician that plays Scandinavian music. Using semantic technologies, resources such as the audio file of the performance, or the photograph of the conductor, can be annotated using the Resource Description Framework (RDF). In this framework, formal names can be assigned to what are called resources, which would include Beethoven, his violin concerto, the orchestra, and the conductor. Names can also be assigned to types (or classes) of resource (composers, concertos, etc.), and to relationships (or properties) that link resources (e.g., the "composed-by" relationship between composition and composer). By reasoning over facts encoded in this way, a query system can confirm that a performance was given by the Beijing Symphony Orchestra, that this orchestra is based in Beijing, that Beijing is located in China, and so forth -- thus combining geographical and musical knowledge in order to retrieve an answer.
In designing these semantic technologies, a key design decision was to leave open the naming of resources and properties, provided that names conform to the format for web resource names -- that is, provided they are Uniform Resource Identifiers or URIs.
All four of the above could be names for Beethoven, illustrating that the URI need not be human-readable (e.g., it might be an arbitrary string of letters and numbers), although identifiers should be resolvable to RDF representations that include human-readable labels, as explained below. If data from different sources are to be combined, it is therefore important to establish links, for instance through statements indicating that the above four URIs are synonymous. These statements, which can also be expressed in RDF, provide a means by which data published by many people or organisations can be combined into linked data.
In the following chapters, we will show through practical examples how to describe resources in RDF, how to convert data from other formats to RDF, how to publish RDF data, and how to link published RDF to other datasets. We will also consider how to utilize existing linked data in applications for querying, analysis, mining, and visualisation.