Transforming LLM Reliability with Pythia: Wisecube’s AI Hallucination Detector

Healthcare decisions have a significant impact on human life. However, Hallucinations in LLM-generated content lead to misleading decisions that hinder the progressive landscape of biomedical research. LLM hallucinations have various types, including factual hallucinations, nonsensical outputs, hallucinations of bias, and illusions of understanding. 

While traditional methods for minimizing AI hallucinations have limitations, knowledge graphs offer a solution to ensure factual integrity in AI-generated content, querying and analyzing AI content from credible datasets. Their ability to capture relationships between data and detect deeper hallucinations avoids errors caused by LLM misinformation.

Enter Pythia — an open-source knowledge graph library that allows you to construct knowledge graphs. Pythia’s open-source nature makes it accessible to a wider audience. Using Pythia to build knowledge graphs, researchers can strengthen their defense against LLM hallucinations, validating and correcting distortions in AI-generated content.

Recognizing the potential of knowledge graphs to enhance LLM reliability, Wisecube leveraged its billion-scale knowledge graph to build an AI Fact Checker. Wisecube AI Fact Checker ensures the factual integrity of AI-generated content. 

In this article, we will discuss Wisecube’s LLM Fact Checking process, a solution to combat the limitations of traditional methods. We’ll see why addressing hallucinations is critical in biomedicine and how Wisecube offers a competitive advantage to healthcare researchers with its ability to detect finer errors. 

Limitations of Traditional Hallucination Detection Methods

Traditional methods for AI hallucination detection involve extracting sentences from LLM-generated content and comparing them against references. This comparison allows the detection of where hallucinations have occurred in AI-generated content. However, this doesn’t guarantee capturing language nuances, factual inconsistencies, and contextual deviations. These limitations restrict traditional methods to:

  • Surface level analysis: Comparing and analyzing complete phrases or sentences only highlights surface-level misinformation, ignoring contextually misleading information.
  • Difficulty identifying connections: Traditional methods struggle to identify underlying connections between facts across multiple sentences. The inability to detect connections neglects the big picture, overlooking granular details.

Significance of AI Fact-Checking

Inaccurate LLM-generated content exposes biomedicine research to hindrance or misleading innovation. Currently, LLMs are far from generating highly reliable output, considering the following figures:

  1. Industry-leading LLMs have a 27% hallucination rate.
  2. 15-20% of text by general LLMs is unreliable.
  3. GPT-4 Turbo text summarization comprises a 3% hallucination rate.

With frequently occurring hallucinations, LLMs can’t be leveraged to their full potential in making critical healthcare decisions. Wisecube combines knowledge triplets with a semantic data model to surpass the limited accuracy of LLMs, leading to highly accurate claims. 

Wisecube’s AI Fact Checker to the Rescue

Wisecube’s AI Fact Checker enables accurate and context-aware fact-checking in LLMs. 

It leverages Orpheus, a foundational graph AI model built upon our billion-scale knowledge graph, to strengthen contextual understanding of AI. Orpheus’s knowledge graph currently allows access to 10 billion biomedical facts and 30 million biomedical articles while capturing the relationship between them.

The Wisecube Fact Checker is designed specifically for biomedical research, built upon a specialized knowledge base. Apart from its domain-specific knowledge, it leverages a semantic data model for enhanced hallucination detection. Combining all the elements of our Fact Checker, it addresses, but is not limited to, the shortcomings of traditional methods:

  • Deeper analysis: Wisecube’s Fact Checker allows deeper analysis of content with the help of knowledge triplets.
  • Relationships detection: Billion-scale knowledge graph captures the relationship between biomedical facts and figures, making AI context-aware.

Demystifying the Wisecube’s LLM Fact Checking Process

Wisecube’s LLM Fact Checking process aims at detecting hallucination in AI output against the underlying references. The process runs upon robust frameworks for finding references, identifying the granularity of evaluation, and categorizing the claims in the responses.

1. LLM Usage Patterns and References

Wisecube’s AI Fact Checker uses different approaches to finding references based on the type of context within the data:

  1. Zero Context: The Fact Checker can directly compare references it finds with the LLM’s responses to assess accuracy.
  2. Noisy Context: When a question includes some context but is either noisy or incomplete, the Fact Checker might consult authoritative data before the LLM generates a response.
  3. Accurate Context: When the context is complete and reliable, the Fact Checker can use the references to provide a comprehensive summary or information in response.

2. Evaluation Granularity

Unlike existing methods that analyze paragraphs or sentences, Claim Checker breaks LLM responses into knowledge triplets. This enables the testing of the factualness of individual knowledge points and provides more informative and precise insights. 

Inspired by knowledge graphs, each triplet represents <subject, predicate, object>, extracting granular details. Knowledge triplets capture finer-grained information about the content of LLM-generated text. This allows for understanding the context and analyzing the correctness of each point, unlike traditional methods.

3. Semantic Data Model

The semantic data model is the foundation of this methodology, allowing the fact checker to effectively analyze data from different LLM frameworks.

The semantic data pipeline transforms a query in natural language into Resource Description Framework (RDF) format, a framework for representing interconnected data on the web. The mapping of natural language to RDF defines how natural language text corresponds to the entities and relationships in the model, accurately representing data in the graph database. The entities and their relationships are then examined to understand the context.

The Semantic Data Model also identifies gaps in natural language text through the fusion and integration process. This process involves enriching the data by consulting external, trusted data sources like third-party datasets.

4. Claim Extraction

Fact checker extracts sentences and phrases from LLM-generated text and breaks them into knowledge triplets. Triplets enable analyzing connections within sentences like humans, enhancing AI context understanding. Through this approach, LLM can assess both correctness and interpretation.

5. Claim Categorization

Once the claims are extracted in the form of triplets, the Wisecube Fact Checker compares responses against references and categorizes them into four categories:

  1. Entailment: Claims that are present in both response and references, indicating accurate outputs.
  2. Contradiction: Claims present in LLM responses but disregarded by references.
  3. Missing facts: Claims present in references but absent in LLM responses, representing gaps in LLM responses.
  4. Neutral: Claims present in LLM response but are neither contradicted nor confirmed by the references. 

Categorizing claims into these categories allows for their evaluation against metrics and the generation of an audit report for LLM performance.  

6. Claim Checker Pipeline

The claim checker pipeline is a framework designed to ensure the accuracy and reliability of LLM-generated outputs. The pipeline consists of two key modules i.e., Claim Extractor and Hallucination Checker. 

The process initiates from LLM generating an output in response to an input query. To audit the LLM pipeline, the claim checker extracts claims from relevant references i.e., datasets, research papers, ontologies, etc. These references are selected based on their relevance and authority to the subject matter. Finally, the claim checker extracts claims from LLM responses in the form of triplets to compare. The extracted claims are broken down into knowledge triplets, representing factual information in a structured format. 

The claim checker then compares knowledge triplets with relevant references to identify entailments, contradictions, missing facts, and neutral claims. Finally, the claim checker aggregates the findings based on the categorization stage. This involves compiling metrics based on LLM performance to create a detailed audit report, highlighting the areas of strengths and improvements in the system. 

After the comparison, the claims are categorized and aggregated for audit report based on the following basis:

  • The claims that occur in both LLM responses and references are called verified claims. For example, patient, has_dx, diabetes.
  • The contradicting claims, representing factual errors, are hallucinations. For example, patient, age, 82 in LLM response vs. patient, age, 58 in References. 
  • Claims that are present in references but missing in LLM responses are missing claims. 
  • Neutral claims, present in the LLM response but missing in references are potential hallucinations. These are either fabricated by LLM or actual facts absent in references. For example, coumadin, treats, headaches are present in LLM response but are missing in references.

Neutral claims, being potential hallucinations, give an upper bound for LLM hallucinations i.e., the extent to which LLM responses could deviate. Furthermore, feeding neutral triplets into the Wisecube link prediction model will predict the confidence level of neutral claims.

With all these benefits, Wisecube offers biomedical researchers a competitive advantage. The Fact Checker can be easily integrated into existing workflows, providing real-time content verification and factual integrity. 

Contact us today to get started with Wisecube AI Fact Checker, audit your healthcare LLM pipelines, and make revolutions in healthcare research.

Table of Contents

Scroll to Top