Linked Data Search
4.14 Semantic search process[edit | edit source]
The figure below provides a way of thinking about semantic search and the potential roles for visualization within that. A user query will be specified possibly as keywords or in natural language. This query can be matched to the underlying graph of data, in order to rank and retrieve results to be returned to the user. Visualization can play two roles. First, visualization can be used to present the search results. For example, search results related to locations may be visualized on a map. Second, visualization can be used to present the query. The user may be able to refine the query by interacting directly with the visualization. Faceted search (discussed more later) would be one example of this. In this case the description of the resources returned in the search can be used to construct a range of filters based on their properties such as time, people and locations. Selecting from the facets limits the part of the data graph of interest.
The user can alter the query by manipulating the visualization. This allows them to modify the underlying SPARQL query with needing to use SPARQL directly.
Figure 36: The role of visualization in search (based on ).
4.15 Semantic search[edit | edit source]
One possible method of input to semantic search is a pseudo natural language query. Entity extraction techniques can then be applied to the text query (a number of entity extraction tools were presented in chapter 3, section 3.9.3).
Figure 37: Extraction of entities from a pseud-natural language query.
By a process termed query expansion, the entities can be mapped to resources within our dataset. For an entity such as ”song” this may involve expanding to a number of candidate synonyms and then attempting to map to these to resources in the dataset.
Figure 38: Mapping the entity “song” to the class Track in the music ontology.
Similarly, the entity “written by” may be mapped to the entity composer in the music ontology.
Figure 39: Mapping the “written by” entity to the composer property.
In some cases there may be a direct mapping between an entity and resource in the dataset. For example, the entity “member (of)” may be mapped to the member_of property in the music ontology. This is the inverse of the member property in the music ontology that is used to define artists as members of groups.
Figure 40: Mapping the “member (of)” entity.
A process of contextual analysis is used to decide between candidate mappings. If we start to piece together the parts of the query into the same subgraph we see that the range of the mo:member property is the class mo:MusicGroup. According to this subgraph we would expect the “the Beatles” to be a music group. We would therefore select this expansion of the entity over musical works, posters and books that also have “the Beatles” as their label.
Figure 41: Piecing together a subgraph for the query.
Once the entity mappings have been established, the pseudo natural language query can then be expressed as a SPARQL query (shown visually below) that retrieves tracks (variable ?x) composed by someone (variable ?y) who is a member of The Beatles. (See chapter 2 for an introduction to SPARQL.)
Figure 42: The SPARQL query (shown visually) for tracks by members of The Beatles.
4.16 Semantic search versus SPARQL query[edit | edit source]
As we can see from the above example, semantic search aims at understanding the meaning of the entities specified in the query. This can involve expanding the entity into a number of candidate synonyms and then matching those to the dataset. Contextual analysis can also be used to decide between alternative meanings by comparing against the subgraph that is being produced by mapping the entities. In some cases (though not seen in the example above) reasoning may be applied in order to derive answers that are not explicitly contained in the data, but can be derived from the data.
Comparing semantic search against querying a dataset using SPARQL we see a number of differences. In semantic search, entities extracted from the search string can be expanded and mapped to resources in the dataset. In SPARQL, generally direct reference will be made to the resources. Semantic search may also allow fuzzy matching in which a weighting or certainty is applied to a mapping between a search term and a resource. Both SPARQL and semantic search work with graph patterns. SPARQL queries are graph patterns applied against the dataset. In semantic search, the sub-graph built up to represent the query can also be used to analyse context. Finally, semantic search can apply reasoning to identify new paths in the data and derive links between resources not explicit in the dataset.
Figure 43: A comparison of semantic search and SPARQL queries.
4.17 Semantic search in Google[edit | edit source]
Many of the features of semantic search are increasingly found in web search. For a number of queries, Google, as well as providing a page ranked list of links, uses the Google Knowledge Graph to provide direct answers to questions. As we see below, a search for “paul mccartney albums” results in a horizontal panel of albums. On the right we see the disambiguation pane (described in section 3.8.2 of chapter 3 in the context of Rich Snippets) giving additional information about the primary entities extracted from the query and mapped to the Google Knowledge Graph.
Figure 44: Semantic search in Google.
4.18 Semantic search in DuckDuckGo[edit | edit source]
Similar semantic features are found in other web search engines such as DuckDuckGo  that combines pseudo natural language expressions and semantics to assist the user in focussing down their search. In the example below a natural language query has produced a list of answers rather than the more conventional list of documents matching the search query.
Figure 45: DuckDuckGo producing a list of answers.
DuckDuckGo can also perform disambiguation to offer alternative matches for a search query. In the example below, alternative meanings of the search term jaguar have been offered, grouped into classes such as companies and music. The user can select one of these classes to drill down to their intended meaning.
Figure 46: Query disambiguation in DuckDuckGo.
4.19 Faceted Search[edit | edit source]
The Information Workbench, which we saw earlier, is one tool that can provide faceted search over a dataset. Facets are derived from the properties used to describe the resources returned from the search query. From within each facet, the user can select the values. In the figure below the location facet (using the foaf:based_near property) allows the user to drill down to artists (represented as images) from a particular county. The values of the property are sorted according to frequency, therefore giving a higher rank to values that identify the largest number of resources.
Your browser does not support the video tag.
Movie 5: An introduction to faceted search and the challenges associated with supporting faceted search.
If the facets are largely independent (i.e. do not demark the same subsets of resources) then a small set of facets can be used to filter quickly a large dataset to a small number of items of interest.
Figure 47: Faceted search using the Information Workbench.
An interesting challenge when supporting faceted search is determining which of the potentially large number of facets to prioritize in the interface. This can be a particular problem with heterogeneous data, typical of Linked Open Data, where different properties (and therefore facets) apply to different parts of the dataset. Addressing this challenge involves not only prioritizing facets that have good coverage of the resources of interest but also have values that discriminate between them. Facets having values that split the resources into subsets of similar size would then be optimal for filtering.
As the most appropriate facets depend on the properties and values of the resources of interest, they can be expected to change if keyword search is used to select an initial set of resources for faceted browsing.
Another challenge is the real time computation of previews . With a large dataset it can be computationally too expensive to calculate the frequencies for all values of all facets. Counts then need to be predicted from a sample giving the user an indication of what they can expect.
4.20 FacetedDBLP[edit | edit source]
A well-known example of faceted search is FacetedDBLP , an interface for browsing DBLP , a bibliography of computer science publications. Facetted browsing is generally initiated by a keyword search to identify a region of interest. The returned results (i.e. publication records) can then be filtered using facets to restrict the results in terms of publication year, publication types, venue and authors. The publication records queried via FacetedDBLP are stored in a MySQL database. An RDB2RDF server is used to provide a SPARQL interface to the dataset (see section 3.9.2 of chapter 3 for a description of how to map a relational database to RDF).
Figure 48: FacetedDBLP.
4.22 Searching for semantic data[edit | edit source]
In the examples we have seen so far, search is primarily about finding content. Semantics assist in the search for content by, for example, disambiguating search terms. However, some search engines are directed at finding semantic data. These tools can be used to search for ontologies, vocabularies or particular RDF documents. These tools are therefore aimed more at data specialists rather than the end user.
Figure 51. Watson semantic web search engine .
Vocabularies can be searched using the LOV portal (this was mentioned as a source of reusable vocabularies in section 3.4 of chapter 3). A keyword search can be used to retrieve a set of vocabularies that can then be filtered using a number of provided facets. The results returned also have a confidence score giving an explicit indication of relevance of that vocabulary to the query. Similar to Watson, the snippets or previews also provide an indication of how the search terms match the vocabulary.
Figure 52: Using the LOV portal to search for vocabularies.