Natural Language Querying in Orpheus
In modern search engines there is another kind of exploratory query - a question.
Questions are queries that are not looking for a document, but instead for an answer. This type of query is exploratory in the sense that the user does not have a correct document in mind, but it differs in that there is an answer ostensibly in one or more of the documents.
The current approach to question answering (QA) is to split the task into two parts.
- The first part is retrieving the documents that hopefully contain the answer.
- The second part is identifying the snippet of text that answers the question.
The first is done with a traditional search engine, and the second is done with a deep learning model.
Here is a high level overview of how this works in the Wisecube Orpheus Product:
A newer approach to question answering is to directly generate the answer. Generally, language models are used for this. Language models are machine learning (now almost entirely deep learning models) that have learned to take a piece of incomplete text and predict what comes next. In this case, the language model is trained to predict answers from questions directly.
This has the advantages of not needing the document search step, as well as not being limited to text in the retrieved documents. The downside is that extraction-based QA inherently supplies the source document of the answer. Generative QA, naturally, does not since it is generating the answer.
There is an additional kind of QA that is possible with a knowledge engine. Multihop QA is for answering questions that need multiple sources of information. For example, asking “What drugs that interact with a pathway related to disease X that have reached phase 3 for any disease?” We are unlikely to find a snippet in any document that answers this question.
Similarly, a language model is limited to the language it was trained on, and this is also generally outside the usual kinds of questions generative QA are trained on. This requires that we take the natural language question and query the knowledge graph itself.