Exploring by Category
Now that we have a knowledge graph, we have different options for how the user can explore concepts and their relations. In a classic search engine, we perform exploratory queries by entering a query containing the terms we want to explore. On the other hand, a navigation query is done by putting in terms that are more prominent to the specific document or documents desired. With a knowledge graph, we have another option for exploratory querying. We can find the category of concepts that we wish to explore and either refine our search, retrieve the relevant documents, or even extract specific insights for this category like trend analysis, or prominent authors.
Exploring categories like this generally follows the pattern of identifying the concepts in the category, and then calculating aggregates for each member of the category. This is different from exploring documents where the user query is used, perhaps multiple times, to find documents that may be relevant. The documents are then reviewed and used to either look up related documents, cited papers for example, or to do another round of refined querying. The process where a user performs multiple searches to find their desired results is called session querying. This is still a useful concept in a biomedical knowledge engine. People often do not have a crisp and clear definition of the category they are interested in. The system used to explore these categories must also help the user in refining their definitions.
To understand how these categories may be defined we have to take into consideration how information may be organized. Let’s take a look at two common ways of categorizing things: thematic and class-based.
Thematic categories are organized by concepts that are highly related or occur in similar contexts. For example, glioblastoma, spinal cord, and TTF therapy often occur in the same contexts.
Class-based categories are organized by what kind of thing a concept is. For example, hairy cell leukemia is a kind of leukemia which in turn is a kind of cancer. This kind of relationship is sometimes referred to as “is-a” relationships.
Cisplatin is a drug used to treat multiple types of cancer. If we are looking at things thematically, Cisplatin is in the Cancer category. Class-wise, Cisplatin is in the Pharmaceutical category. Fortunately, a well built knowledge graph will contain the relationships necessary to view concepts in either way. By making both of these organizational styles available to the user we allow users to explore concepts in a manner that is more natural to the way that humans understand them.
There are both active and passive kinds of exploration.
Passive exploration helps a user get an understanding of the data available to them. These passive insights are generally very high level, but the user can use them as an easy starting point.
Active exploration requires that the user define their own entrance into the data. They decide on a query or category that they want to understand. In either case, the user needs to be able to refine the insight.
One of the most important kinds of refinement is the drill down. Most people who have used some sort of data searching solution have a notion of what drill down means, but in a knowledge graph there are actual multiple dimensions a user can use to drill down. In search engine terms, this often means refining a query based on one or a small number of particularly interesting results. In a dashboard powered by structured data, drill down generally means looking at a more detailed display of data, often by adding more groupings or by looking at items within a group. In our system, both these kinds of drill down are available depending on the nature of the insight.
But through the knowledge graph we can look at related things. We can start by looking at prominent authors within a category. We can then try and expand our query to include concepts related to these authors that we have not been included. The freedom and power given by the unified data means that the user can explore in ways that are impossible in siloed heterogeneous data.
One of the most important kinds of insights that users will want to explore is changes over time. With our data, time is actually ambiguous. A user may wish to monitor what entities are in a particular kind of relationship as data is added over time. More likely, time comes from the documents. For example a user wants to know which concepts within a category are trending over time. Being able to tie statements within the knowledge graph to evidence means that we can also tie them to when the statement was made. This allows users to filter to newer information.
Term exploration in the UI