Transfer Learning for Link Prediction on Knowledge Graphs

In the world of data and knowledge graphs, organizations encounter a significant challenge: implementing transfer learning in an environment with scattered data and limited labeled instances. This hurdle questions the prevailing belief that abundant labeled data is essential, presenting an opportunity to rethink this paradigm.

Navigating this complexity involves exploring transfer learning’s effectiveness without excess labeled data. Consider a scenario where a model, adept at predicting edges in a general knowledge graph, refines its capabilities on a smaller, more specialized counterpart.

Why does this matter for your organization? It highlights that transfer learning can be impactful even with dispersed data. This isn’t a typical deep learning narrative; it’s guiding an experienced model to acquire sophisticated new skills in the diverse data landscape.

In this blog, discover how your organization can navigate this challenge using the Orpheus and the specialized graph. The specialized graph results from intersecting the Orpheus graph with the CAS COVID-19 Antiviral Candidate SAR dataset.

Decoding the Orpheus Knowledge Graph

The construction of the Orpheus graph involves a meticulous combination of structured and unstructured datasets. While structured datasets require minimal processing—matching entities to a common identifier—extracting graph elements from unstructured (text) data is a more involved process. 

Let’s explore the Orpheus Knowledge Graph construction process with the key components contributing to the comprehensive understanding of the graph.

Orpheus Pipeline

1. Natural Language Processing (NLP) Pipeline

In NLP processing, we navigate three distinct paths, each encapsulating specific steps.

NLP Pipeline


You can break embedding into two types:

  • Static Embeddings: These provide a single representation for a given token. Common examples include word2vec. The advantage lies in the efficiency of looking up pre-computed representations without the need for a neural network model. However, static embeddings may struggle with words having multiple meanings.
  • Non-Static Embeddings: Unlike static, non-static embeddings, particularly BERT, generate representations based on context. BERT employs a neural network trained to predict masked words, offering a more nuanced representation. The trade-off is the need to run a production model, coupled with resource-intensive demands.

Regular Normalization

In your entity extraction journey, effective text processing is crucial. Tokenization divides the text into words, while normalization standardizes tokens. For precise phrase matching in the CORD-19 dataset, a strategic normalization approach is applied—addressing parsing artifacts and lower-casing the text. This ensures accuracy in entity extraction.

Strong Normalization

In your corpus exploration, leverage topic modeling for efficient summarization and trend identification. To kickstart this process, conduct a robust normalization: remove non-word tokens, eliminate stop words (like “the” or domain-specific terms), and apply lemmatization for dictionary form conversion. This streamlined approach enhances your NLP pipeline and reveals valuable insights from the corpus

2. Entity Extraction

In Orpheus graph construction, extract entity references from text using either a named-entity recognition (NER) model or phrase matching. A reliable model is preferable, but phrase matching with an alias list is a pragmatic alternative. Utilize structured data for metadata, simplifying the process. Ensure mapping between NER-identified entities and aliases in the entity data for seamless integration. 

3. Literature-based Relationships

Once entities are identified, you move on to defining relationships through co-occurrence, a bit like connecting the dots. Choose a context (e.g., paragraph), or a measure (like Jaccard similarity), and apply filters to refine relationships—removing rare entities and setting a minimum value. You can filter the edges by setting a minimum acceptable value for the measure. 

4. Graph Summaries

Graph summaries are like snapshots that give you a clear overview of the essential components. These concise overviews serve as a streamlined means to comprehend the Orpheus graph efficiently. Amidst the intricacies of data analysis, graph summaries function as a strategic tool, providing a quick and professional overview of essential components. 

5. Graph Fusion

Graph fusion is the integration of diverse components within the graph. It’s not just about connecting the dots; it’s about seamlessly merging different elements to enhance the graph’s richness and interconnectedness. This process adds depth and completeness to the analysis, allowing for a nuanced understanding of the interconnected relationships within the data.

6. Graph Overlap

Graph overlap reveals a fascinating aspect of the data landscape. It refers to the nuanced connections and intersections between nodes within the graph. Instead of isolated entities, graph overlap highlights where different elements converge, creating a web of interconnected relationships. This exploration of graph overlap is essential for understanding the intricate interdependencies that contribute to the comprehensive nature of the Orpheus graph.

7. Specialized Graph

In Orpheus, a specialized graph is a refined subset derived from the intersection of the broader Orpheus graph and the CAS COVID-19 Antiviral Candidate SAR dataset. It captures nodes and relationships aligning with CAS criteria, featuring compounds identified by CAS RN and proteins by names. 

Network Diagram of the Specialized Graph

The edges mirror those in the broader graph, ensuring relevance. This precision makes the specialized graph a focused tool for extracting insights specific to the COVID-19 Antiviral Candidate SAR dataset within the Orpheus framework.

Link Prediction Modeling: Building Links with Deepwalk 

In constructing a link prediction model, the initial step involves generating vector representations for graph nodes using DeepWalk. This method employs random walks to create fixed-length sequences of nodes. Subsequently, a single-layer neural network, similar to the skipgram word2vec technique, is developed. 

It aims to predict nodes in the walk sequence based on the central node. Once these vector representations are established, training on node pairs can commence.

Let’s explore the different steps involved in link prediction modeling: 


In crafting your link prediction model, each training example consists of two node vectors and a label – 1 if the pair forms an edge, 0 otherwise. You have various techniques to combine these vectors such as concatenation. Considering the expansive Orpheus graph, you will be dealing with a vast number of potential examples. The specialized graph helps narrow it down. 

Two insights emerge: the classification problem is imbalanced, potentially biased toward negative predictions, and the training process will demand significant resources.

For the model architecture, simplicity is key in this. You’ll have a straightforward setup with one hidden layer and a dropout layer to tackle overfitting.

Local Model

The initial model to train is the local model, exclusively using the specialized graph. Opting for a representation size of 20 creates a model boasting 8401 parameters. However, this model’s performance will be poor. It’s even worse than random. Wondering why? The model struggles to discern between pairs lacking edges and legitimate non-edges

Orpheus Model

The next step is to modify the Orpheus graph by excluding specialized edges for a cleaner model comparison. With a substantial representation size of 100 for this expansive graph, your training approach involves randomly generating pairs in each epoch – a departure from the specialized graph’s holistic approach. The result? This model outshines your local model by a significant margin.

This assessment highlights a crucial point – your general model excels in classifying local edges compared to your local model. However, anchoring these findings with more concrete metrics is imperative.

Evaluation Metrics

In your analysis, employ three crucial metrics to gauge the model’s value. Begin with recall – a metric revealing the proportion of missing edges the model identifies:

recall = #{missing edges found}/#{total missing edges}}

Precision is the next metric, showcasing the model’s accuracy in predicting missing edges:

precision = #{missing edges found}/#{pairs predicted to be edges}

To define recall, determine a threshold. Identify two thresholds: the one maximizing the F1 score (harmonic mean of precision and recall) and 0.5, considering the model predictions as predicted probabilities.

Instead of precision thresholds, opt for a variant of the information retrieval metric:

precision@K=#{missing edges found in the top K predictions}/K

This metric offers a clear understanding of the model’s value. If you take pairs predicted as missing edges with the highest scores and pass them to human investigators, it gauges efficiency.

Specifically, focusing on precision@100, using the Orpheus model over the local model yields twice the return on investment for human investigators validating suggested edges. Remarkably, this efficiency is achieved without the Orpheus model utilizing any local information. Further improvements are seen when fine-tuning the Orpheus model.

Fine-tuning for Transfer Learning

In refining your model, fine-tuning becomes an essential technique for enhancing transfer learning. The premise is simple: a model, initially trained on a broad dataset (the Orpheus model on the Orpheus knowledge graph), undergoes additional training on a local dataset. 

There’s a presence of transferable information in the Orpheus model. However, there’s a critical note – a perilous bias in the specialized graph which makes the local model to underpredict. Striking the right balance in training the general model is crucial to avoid overfitting.

With fine-tuning, you can witness a notable enhancement in both classification and grounded metrics. This strategic approach further refines the model’s performance.

Applications of Graph-Based Intelligence

Let’s explore two applications of transfer learning for link prediction on knowledge graphs.

1. Lead Generation

The primary application of this experiment revolves around enhancing lead generation. By utilizing a specialized set of entity pairs, such as compounds and proteins, the technique leverages a more general graph for prioritizing further investigations.

Lead Generation Pipeline

Literature-based edges play an essential role, enabling users to gather relevant documents for investigation. The link prediction model relies on node representations derived from walks, making paths in the general graph informative for predicting edges in the specialized graph. If these paths involve literature, documents along the path between two nodes are likely to be pertinent to understanding the predicted relationship.

2. Session Search

Knowledge graphs enhance the search experience, enabling users to query documents and identify articles for research using the general graph. Users can review documents for evidence of specific relationships and select entities in each document, forming a new specialized graph. 

Session Search

Leveraging the lead generation experience, this specialized graph expands, providing new documents for review. This accelerates research by tapping into the entire collection of unstructured and structured data.

The Next Frontier in Specialized Graph Expansion

Specifically, leveraging insights from a more extensive knowledge graph to enhance smaller, specialized knowledge graphs has the unique potential to double the efficiency of human efforts in expanding curated knowledge. The technique discussed involves using transfer learning to apply models learned from general knowledge graphs to specialized ones, thereby streamlining the curation process. 

Despite this promising advancement, further research is needed to uncover the exact limitations of this approach. Notably, the current research stands out for its lack of domain assumptions in its fundamental processes.
If you want to learn more about using knowledge graphs in your organization, schedule a demo with Wisecube today!

Table of Contents

Scroll to Top