Relationship Discovery from Open Knowledge Graphs and Its Applications
Abstract:
Open Knowledge Graphs (KGs), such as Wikidata and Freebase, have become increasingly popular. They contain hundreds of millions of facts about real-world entities. A relationship path connecting two entity nodes in a KG forms a chain of facts, which explicitly describes a relationship between the entities. This kind of rich information can aid many tasks such as question answering, fact-checking, and recommendation. However, it is challenging to extract and use such information from those heterogeneous KGs with huge data volume. In this thesis, we propose novel approaches for efficient and effective relationship discovery from open KGs as well as novel applications that are built upon the discovered relationships.
In the first work, we propose to query and discover semantically relevant relationships by adopting the keyword search methodology. Typically, a keyword search engine takes as input a keyword query comprising a set of entity labels and outputs a concise subgraph that closely connects all the input labels. The paths in the resulting subgraph are explicit relationships that describe how entities are related to each other. The distinguishing advantage of keyword search methodology is that it accepts simple input format. Therefore, users do not need to know any complex structured query languages or the underlying data schema. This is much desired for searching against heterogeneous data sources, like open KGs. Unfortunately, the huge data volume poses significant computational challenges for keyword search in KGs. As such, few existing approaches can achieve real-time responses. Hence, we propose WikiSearch, a real-time search engine that uses keyword search for relationship discovery. WikiSearch harnesses the parallel computational power of modern hardware, e.g. multi-core CPUs and GPUs, to reduce the processing time to milliseconds.
Our first work mentioned above makes it possible to discover relationship paths efficiently and effectively for handling end-user queries. In the second work, we developed a novel approach, which mines the relationship paths from the KGs for empowering intuitive news search. News search tools help end-users for identifying relevant news stories. However, existing search approaches often carry out in a "black-box'' process. There is little intuition that helps users understand how the results are related to the query. Thus, we propose a novel news search framework, called NewsLink, to empower intuitive news search by using relationship paths discovered from open KGs. Specifically, NewsLink embeds both a query and news articles to subgraphs, called subgraph embeddings, in the KG. Their embeddings' overlap induces relationship paths between the involving entities. Two major advantages are obtained by incorporating subgraph embeddings into search. First, they enrich the search context, leading to robust results. Second, the relationship paths linking entities inter and intra news articles can help users better understand and digest the results for the given query.
The above two works focus on the discovery and application of relationship paths. Whereas, our third work studies using patterns of relationship paths to help discover Context-aware Outstanding Facts (COFs) from open KGs. An Outstanding Fact (OF) is an attribute that makes a target entity stand out from its peers. The mining of OFs has important applications, especially in Computational Journalism, such as news promotion, fact-checking, and news story finding. However, existing approaches to OF mining: (i) disregard the context in which the target entity appears, hence may report facts irrelevant to that context; and (ii) require relational data, which are often unavailable or incomplete in many application domains. Therefore, we introduce the novel problem of mining COFs for a target entity under a given context specified by a context entity. We propose FMiner, a context-aware mining framework that leverages knowledge graphs (KGs) for COF mining. The context-awareness is ensured by relying on the relevant relationships with the context entity to derive peer entities for COF extraction. Consequently, FMiner effectively navigates the search towards obtaining context-aware OFs by incorporating a context entity.
For the proposed approaches in this thesis, extensive experiments, running on huge real-world open KGs, are conducted to validate both the efficiency and the effectiveness. We believe that the proposed approaches can aid and facilitate many downstream applications that make use of the discovered relationships from KGs.