Linked Data Visualization
4.3 LD Visualization Techniques[edit | edit source]
Linked Data visualization techniques aim to provide graphical representations of some information of interest within a dataset. Visualizations need to be selected that match the type of data, for example whether it be numerical data or location information. Visualizations also need to be selected that match the task that the user is trying to perform, bringing to the foreground the types of data and patterns in the data that they wish to work with.
Figure 3 illustrates the way in which raw RDF data needs to be transformed to produce visualizations. First, the data of interest is extracted from the dataset. Performing a SPARQL query can do this. Second, the data needs to be transformed in order that it can be displayed with the intended visualization methods. A simple example would be to extract a numerical value from a string so that it could then be visualized on a bar chart. Third, the data needs to be mapped to the constructs of the visualization. For example, the numerical value could be mapped to the y-axis of a bar chart. The view resulting form this process may not be a static image. It may provide ways in which the user can interact with the data, by zooming or clicking to trigger further visualizations.
Figure 3: Linked Data visualization process (partially based on )
An example of the Linked Data visualization process is shown in figure 4. A SPARQL query is used to extract the data of interest from the MusicBrianz dataset, in this case the number of Beatles releases per country. In the second step the string value representing the country is transformed into a country code that can be used on the visualization. In the third step, this data is passed to a heatmap visualization. The country code is used to identify an area on the map. The number of releases is mapped to the warmth of the colour of that region. The resulting heatmap is shown in figure 5.
Figure 4. Example of the Linked Data visualization process.
Figure 5: Heatmap visualization of Beatles releases.
4.4 Challenges for Linked Data visualization[edit | edit source]
The domain of Linked Data delivers a number of challenges for Linked Data. First, there is the challenge of scalability. The data of interest, for example returned from a SPARQL query of a dataset could be vast. Visualization techniques need to be used that can scale to large amounts of data. The visualizations tools also need to be powerful and efficient enough to render the information with an acceptable timescale.
One useful way of addressing scale is to provide visualizations that enable user interaction. All data of potential interest then does not need to be provided within a single static view. The user has control over the visualization, allowing navigation through the data. User interaction functionality may also provide support for user editing of the data or annotation of the visualization itself. When visualizing the data, the user may spot errors or omissions that could be fixed interactively through the visualization. The user may also wish to highlight or make comments about some region of the data, essentially adding metadata to the dataset.
Linked data visualizations and the software mechanisms used to construct them should ideally be reusable. Developing tools to produce visualizations such as such as maps and timelines involves a lot of effort. It is therefore more efficient to produce generic tools that can be reused with many datasets. The emergence of standards for representing types of data (such as time and location information) facilitates the use of visualization tools. Ideally, the resulting visualizations should also be reusable and sharable using standard formats.
4.5 Challenges for Open Linked Data visualization[edit | edit source]
When we consider Linked Open Data rather than just Linked Data, further challenges need to be addressed. First, the data of interest may be partitioned across different repositories. Assembling the data of interest will therefore require access to multiple datasets. Second, the assembled heterogeneous data may model the concepts in different ways. Alternative formats may also be used for values. For example in different repositories date information may be variously represented as DD-MM-YYYY, MM-DD-YYYY or just YYYY. Third, working with a dataset assembled from multiple repositories increases the likelihood of missing data. Visualizations will need to be able to handle the level of missing data and perhaps also indicate to the user data that cannot be represented in the selected visualization.
4.6 Classification of visualization techniques[edit | edit source]
Visualization techniques can be classified according to the type of analytical task that the user is attempting to perform on the data. Visualization techniques such as pie charts are appropriate for comparing the attributes or values of different variables within the dataset. If the user wants to analyse relationships and hierarchies then graphs and other related techniques can be used. The analysis of data in time or space can be supported with timelines and maps. A scatter plot can be used to analyse three-dimensional data. Higher dimensional data can be visualised using techniques such as radar charts. The following subsections will describe in more detail a range of example visualization techniques and how they can be used.
Figure 6: Visualization techniques appropriate for different data analysis tasks.
4.6.1 Comparison of attributes/values[edit | edit source]
The most appropriate visualization for comparing attributes or values will depend on the nature of the data and the task. To compare absolute values (such as total number of sales) associated with a list of items (such as different albums) then a bar chart would be appropriate. If only relative rather than absolute values were of interest then a pie chart could be used. For bar charts and pie charts, the items associated with value (e.g. albums) do not necessarily have any predefined order or position in the chart. If the items do have a pre-define order (for example the release date of the albums) a line chart can be used to show the trend. Finally, frequency distributions for an ordered variable can be visualised using a histogram. This could be used for example to plot the frequency of tracks of varying lengths.
Figure 7: Visualizations for comparing attributes or values. Top left: Using a bar chart to compare values across a set of categories . Top right: Using a pie chart to compare proportions . Bottom left: Using a line chart to visualise a series of data points against an ordered set of points on the x-axis . Bottom right: Using a histogram to visualise frequency distribution .
4.6.2 Analysis of relationships and hierarchies[edit | edit source]
Relationships between nodes can be visualised using a standard graph notation in which relationships are represented as lines. Graphs data can also be visualized using an arc diagram in which the nodes are organised linearly. Relationships between nodes are represented as half circles connecting the two nodes. When using an adjacency matrix, the nodes of the graph and placed on both the x-axis and y-axis. Relationships between the nodes are represented as entries in the grid.
Figure 8. Visualizing relationships using a graph (left), an arc diagram (middle) and adjacency matrix diagram (right) .
Some visualizations are specifically designed for hierarchical graph data. The indented tree is a familiar formalism commonly used for visualising hierarchies and navigating file directories. The node-link tree is a tree visualization in which the root node is placed in the centre. This provides a visual cue as to the population level of different sections of the hierarchy.
Figure 9: Visualizing hierarchies using an indented tree (right) and node-link tree (right) .
A number of space filling visualization techniques have been designed specifically to give an indication of the population level of different parts of the hierarchy. Treemaps visualise nodes within a hierarchy as a set of rectangles. Containment can be used to represent hierarchical relationships between nodes. The size of the rectangle is generally used to represent the number of individuals of a node (i.e. class) within the dataset. Colour can be used to represent some feature of the nodes (i.e. classes) such as the discrete set of superclasses to which they belong.
The icicle visualization can be used to show a node hierarchy and gives a clear indication of depth at different parts of the hierarchy. The sunburst essentially folds the icicle visualization into a circle. Similarly, the rose diagram uses the size of sectors to indicate the population of parts of the hierarchy. Finally, a circle-packing visualization uses containment to represent the hierarchy and size of the circle to represent containment.
Figure 10: Space filling visualization of a hierarchy using treemaps (left) and icicles (right) .
Figure 11: Space filling visualization of a hierarchy using sunburst (left), circle-packing (middle) and a rose diagram (right) .
4.6.3 Analysis of temporal or geographical events[edit | edit source]
Timeline visualizations can be used in combination with both discrete data where, for example, individual events are marked as dots on the timeline. Timelines can also be used to represent continuous data. For example, changes in the frequency of different types of event over time could be represented using colour to indicate event type and thickness of the band to indicate the frequency of those event types at that point in time.
Figure 12: Visualizing discrete  (left) and continuous  (right) data over time.
Data can also be associated with different types of map visualization. This could involve plotting coordinate points on the map. If the granularity of interest is areas (such as countries) rather than specific points in space, then a choropleth maps can be used. Colour can indicate some feature of the area such as the number of specific data points associated with it. The heat map of Beatles releases shown in figure 5 is an example of a choropleth map. If it is not necessary to indicate the borders between areas (such as country borders) then a Dorling cartogram can be used in which the centre of each circle falls within its associated area and both colour and size of the circle are used to visualize additional data.
Figure 13: Visualising data on a map  (left), choropleth map  (middle) and Dorling cartogram  (right).
4.6.4 Analysis of multidimensional data[edit | edit source]
Some visualizations can be used to represent data having 3 or more dimensions. Three-dimensional data can be represented using a scatter plot. As well as the x-axis and y-axis of the chart, the size of each dot placed on the scatter plot is used to represent a third dimension. Radar charts or parallel coordinates can be used to represent higher dimension data. In a radar (or star) chart, each multi-dimensional point is represented as a shape whose border connects each axis. The axes of a radar chart are represented as spokes of a wheel. In a parallel coordinates visualization, the axes correspond to vertical lines. A multi-dimensional point is show as a line connecting each axis.
Figure 14: Visualizing multidimensional data using a scatter plot (left) and radar or star chart (right) .
Figure 15: Visualizing multidimensional data using parallel coordinates .
4.6.5 Other visualization techniques[edit | edit source]
Text-based visualizations use word size to represent frequency. In a standard tag cloud, words or phrases indicate tags or annotations that has been associated with resources. The larger the word, the more commonly it has been used to tag the resources. A variation on the standard tag clod is the phrase net visualization. This is often used to visualise a document or larger text corpus. Size reflects frequency of the word and also lines connect words that are in close proximity in the text.
Figure 16: Text-based visualization of a tag cloud  (left) and network of phrases  (right).
4.7 Applications of Linked Data visualization techniques[edit | edit source]
Visualizations can be applied to Linked Data in order to satisfy a number of aims. First, particularly given the potential scale of the Linked Data of interest, visualizations can be used to provide an overview of the data to guide further analysis. Visualization can be used to identify and analyse relevant resources, classes or properties in the dataset.
Visualizations can be used to reveal a great deal about the vocabulary or taxonomy used in the dataset to define resources. For example, the various methods of visualising a hierarchy, shown in section 4.6.2, would quickly reveal attributes related to the depth and breadth of the hierarchy and the frequency of use of different concepts.
Visualizations can also be used to reveal different types of desired and undesired patterns in the data. A graph visualization could reveal missing links between nodes or uncover new paths between resources. Visualizations may also be used to uncover hidden patterns, errors or outliers in the data (along the lines illustrated in figure 2).
4.8 Summary of Linked Data visualization tool requirements[edit | edit source]
As described in section 4.4 Linked Data visualizations need to offer data navigation and exploration capabilities. Particularly, given the scale of the data, it is unlikely that a static visualization will provide an adequate view on the data for all purposes. This will involve providing user interaction capabilities such as being able to query data of interest, filtering values and folding or expanding parts of the visualization. Visualizations should also exploit the data structures that are inherent in Linked Data such as such as ontology or taxonomy hierarchies. Ideally, it should be possible for the user to publish or share their visualizations using standard presentation formats for easy distribution. This ability to share should also apply to the extracted data, that is, the particular viewpoint on the data established in the selection and user manipulation of the visualization tool.
4.9 Linked Data visualization tool types[edit | edit source]
Linked Data visualization tools can be organised into four categories. These range from text-based Linked Data browsers to toolkits with extensive functionality for data transformation and graphical visualization.
1) Linked Data browsers with text-based representation
A text-based LD browser dereferences URIs to retrieve a resource description that is then presented to the user. Most LD browsers will present not only text descriptions but also other media such as images associated with the resource. Text based browsers will generally include hypertext links to connected resources. These may be part of the immediate description (i.e. triples that link from this to another resource) or backlinks (i.e. triples that link from other resources to this resource). See section 3.5.2 of chapter 3 for guidance on presenting resource descriptions.
2) Linked Data and RDF browsers with visualization options
Some Linked Data and RDF browsers make more extensive use of media associated with the resource and organise this media into more coherent or engaging presentations of the data. These browsers also offer greater use interaction. As well as links to connected resources they provide ways of querying and filtering the data. These types of browser can therefore be used to analyse as well as traverse the data.
3) Visualization toolkits
Visualization toolkits bring together a range of visualization techniques. They generally incorporate methods for transforming the raw data in order that it can be rendered by the visualization. Some visualization toolkits are specifically designed to consume Linked Data.
4) SPARQL visualization
Finally, SPARQL visualizations are tools that dynamically transform the output of SPARQL queries to produce visualizations. These tools can be used to support analysis of the dataset that is accessed by the SPARQL endpoint.
As summarised in the figure below, a number of current tools fall into these four categories. In the next section we will introduce Sig.ma and Sindice as examples of Linked Data browsers and then the Information Workbench will be introduced as an example of a visualization toolkit that can also visualize the results of SPARQL queries.
Figure 17: Types of Linked Data visualization tool.
4.10 Linked Data visualization examples[edit | edit source]
4.10.1 Sig.ma[edit | edit source]
Sig.ma  is a text-based Linked Data browser. Interaction is usually initiated by a text search. Sig.ma returns all triples associated with the search terms. The returned triples are grouped according to predicate. Sig.ma lists all found values for each predicate. The sources of each triple are displayed in the right hand panel. Each source has a number that is used to reference each triple in the main body of the page.
Figure 18: The Sig.ma Linked Data browser.
For a particular predicate, such as title or label, Sig.ma may display the same information in multiple languages. Property values that are URIs can be followed in order to view the RDF data of that connected resource.
Figure 19: The Sig.ma Linked Data browser showing multiple values for a predicate.
4.10.2 Sindice[edit | edit source]
Interaction with Sindice  also begins with a keyword search. A set or results that match the query are displayed. These can be filtered by document type, for example filtering results to only RDF documents. For each document, the user can inspect the RDF triples that it contains. The user can either inspect a cached set of triples or retrieve triples live from the resource.
Figure 20: The Sindice Linked Data browser.
This list of cached or live triples is displayed as a subject/predicate/object table. Sindice also offers other viewing options such as a graph of the triples.
Figure 21: Using Sindice to view the live or cached triples contained in a document.
4.10.3 Information Workbench[edit | edit source]
The Information Workbench  is a platform targeted at the whole lifecycle of linked data application development including integrating, managing, analysing and exploring linked data. Here we focus on the visualization capabilities of the Information Workbench. Generally, visualizations are constructed using data returned from a SPARQL query. A number of different visualization techniques can be applied to the data including bar charts, pie charts, Google maps and timelines. The visualizations allow for user interaction including browsing and exploring the data. A demo system visualizing MusicBrianz using the Information Workbench is available at .
As with Sig.ma and Sindice, user interaction may begin via text search. In the figure below “The Beatles” has been used as a search term. The user may select full text search. In this case, different types of resources will appear in the results set. For example the resource representing the album “With The Beatles” will be returned as well as a resource representing the band itself. The search can also be limited to particular types of resources, such as music artists. More structured searches can be specified, for example, matching the query only to artists from a specified country. Structured search queries are re-represented as SPARQL queries issued against the dataset.
Figure 22: Keyword and structured search using the Information Workbench.
Having searched for the Beatles and selected the band from the results page, the user is directed to an information page about the Beatles. The top of the page is composed of mash-ups with web services. The panel to the top right show tweets mentioning the Beatles. Below this is a Google map showing the UK as their country of origin. Bottom right is a mash-up with last.fm and YouTube data.
Figure 23: Top of the Beatles page showing mash-ups with web services.
Further down the page we see actual visualizations of the data, each making use of particular SPARQL queries against the dataset. The visualizations show: a table of the track numbers and playing times for Beatles albums (top right), the locations of Beatles releases (top left), a timeline of Beatles release (bottom right) and the number of releases for different albums (bottom right).
Figure 24: Visualizations of Beatles data using a table, map, timeline and bar chart.
The visualizations provided by the Information Workbench also enable user interaction. To the left of the figure below we see a tag cloud of music artists. The size of the text represents number of releases. The text label of each artist is a hypertext link that directs the user to data associated with the resource representing that artist. From there, the user can continue to navigate across the dataset.
Figure 25: Linking from a tag cloud to an information page about this artist.
In the Information Workbench all visualizations are implemented as widgets. The region of the dataset of interest is specified using a SPARQL query. The SPARQL query below requests the top ten Beatles releases based on their duration. The easiest way to visualize a result set is displaying it in a table.
Figure 26: Returning a result set for a SPARQL query.
For many visualizations some level of configuration may need to be specified as well as the SPARQL query. In the example below, we can see that the type of visualization is specified on the first line. This is followed by the SPARQL query. Finally, any configuration settings are provided. If the returned data is to be presented on a bar chart then the author of the visualization will need to specify in the configuration settings the variables returned from the SPARQL query that correspond to the x-axis and y-axis. In the example below, release labels are placed on the x-axis and number of releases is placed on the y-axis. Other features of the visualization may be configured such as the colour and height.
Figure 27: Using a widget to specify a bar chart.
Using the same SPARQL query but with different widgets and configuration setting it is possible to create other visualizations of the same data such as a line chart or pie chart.
Figure 28: Line chart and pie chart visualizations of the same SPARQL query result.
The system can also suggest appropriate widgets for visualization depending on the returned data. In the example below, the results return the playing times for different Beatles albums (labelled 2). The widget auto suggestion link (labelled 1), provides a list of visualization types on the left. On the right (labelled 3) is the selected bar chart visualization. The auto-suggest facility provides a good way to experiment with different ways of visualizing the data.
Figure 29: Using the Information Workbench to auto-suggest visualization widgets.
4.11 Other Linked Data visualization tools[edit | edit source]
There are other tools available for the visualization of Linked Data. LOD live  provides a graph visualization of Linked Data resources. Clicking on the nodes can expand the graph structure. LOD live can be used for live access to SPARQL endpoints. LOD visualization  can produce visual hierarchies using treemaps and trees from live access to a SPARQL endpoint.
Figure 30: LOD live  and LOD visualization  tools.
4.12 Visualizing the Linking Open Data cloud[edit | edit source]
In chapter 3 (section 3.8.1) we looked at the Linking Open Data cloud diagram that represents connections between the Linked Open Data datasets. This is constructed by hand but there are also tools that can visualize connections between datasets.
Figure 31: The Linking Open Data cloud diagram .
Gephi is a platform for visualizing networks, graphs and hierarchies . Gephi can be used to visualize the Linking Open Data cloud. As in the hand-crafted representation, dbpedia.org is the largest node in the network densely connected to other datasets. Colour as well as size is used to represent properties of the dataset. Link length is also used to encode information about the data structure.
Figure 32: Linking Open Data cloud generated using the Gephi platform .
Protoviz  can also be used to automatically visualize the Linking Open Data cloud. The colour of the node reflects the CKAN rating for the dataset (see section 3.8.1 of chapter 3 for a description of CKAN). The intensity of the colour reflects the number of ratings. The proximity of nodes reflects the level of interconnection between the datasets. Outlying nodes in the graph could indicate broken links to other datasets or a genuine lack of semantic relatedness to other datasets . Clicking on a node takes the user to the CKAN page for that dataset.
Figure 33: Linked Open Data Graph by Protovis .
4.13 Linked Data reporting and Google’s Structured Data Dashboard[edit | edit source]
Visualization techniques can also be used in the creation of reports that provide descriptive statistics for a dataset. Often visualizations are displayed in a dashboard that enables user interaction. Several tools exist that can be used for the construction of dashboards including Google Webmaster tools , Information Workbench  and eCloudManager . We saw earlier some of the visualization capabilities of Information Workbench. The eCloudManager is a specific solution for data centres and cloud management. In the rest of this section we focus on Google Webmaster tools and how they can be used to provide webmasters with information about structured data embedded in websites that is recognised by the Google search engine.
Google Webmaster tools can be used to provide general data about a website, in terms of its traffic and how it is indexed by the Google search engine. As a part of this, Google Webmaster tools provides dashboards on the structured data within a website. The Structured Data Dashboard has three levels. A site-level view aggregates the structured data across all pages according to the classes defined in the vocabulary. An item-type-level view provides separate details for each type of resource. A page-level view shows attributes of every type of resource in a given web page.
Figure 34: A site-level view showing the number of resources of different types that have been detected. The chart shows how the amount of structured data is evolving over time .
Figure 35: A page-level view showing the metadata of the imaginary product featured on that page of the website. The detected metadata defines the resource type, image, name and description .