The BioGateway Resource

A Semantic Systems Biology Database

BioGateway consists of a graph-based database built on Semantic Web principles, a SPARQL endpoint allowing users to query it, and a Cytoscape app which integrates the query functionality directly into your network building workflow.

What is BioGateway?

BioGateway is an initiative that enables a Semantic Systems Biology approach. It provides an entry point to access a data warehouse where biological data is gathered in the form of triples (using RDF). The systems can be queried using SPARQL. The BioGateway system can also be explored using the SPARQL browser. With this browser, SPARQL results can be visually seen as a network of resources.


The Cytoscape App

We have developed an app for Cytoscape to allow you to directly integrate the power of our Semantic Knowledge Base into your network building workflow. With the Query Builder tool, you can formulate the topology of what you are looking for, and it will generate the SPARQL query for you.

The query result can then be imported directly into the Cytoscape network you are building – without having to deal with result file formats, incompatible column standards or identifiers.


The BioGateway Database

Unified Identifiers

Every entity in BioGateway has a unique identifier URI across all datasets – allowing queries across data from different sources.

High-confidence Data

The data in BioGateway is a combination of the most trusted datasets from UniProt, IntAct and other curated sources.

Semantic Web Technologies

BioGateway combines Systems Biology and Semantic Web technologies for more effective modelling of biological systems.

Explorative Data

In addition to Transcription factor – Target gene network connections obtained from Curated resources, BioGateway also returns TF-TGs obtained through text mining and allows a validity check in the original abstract.


BioGateway Data model


The BioGateway triple store provides a unified protein-centric view on biological networks. The data in BioGateway are  modeled as directed multi-graphs, not necessarily acyclic, which is a natural choice for representing complex networks.

There are two types of graphs in BioGateway:

A – those that define entities, e.g. proteins, genes, etc.,

B – those that define relations among entities, e.g. protein-protein interaction, protein-disease interactions, etc.

There are three types of nodes in BioGateway:
  • Classes: entities in the domain of discourse, e.g. proteins, diseases, etc. (URIs)
  • Instances: particular interpretations/views of entities conditioned on the source (URIs, only B type graphs)
  • Attributes: qualities, quantities, etc. (literals)

Nodes are connected through multiple types of edges, a.k.a. properties, semantically defined in external ontologies/taxonomies/vocabularies (URIs). Within any given graph a particular property is used within one unique semantic context.

The atomic unit of information (elementary graph) comprises a pair of nodes (subject and object) connected by a directed edge (predicate), commonly known as a triple.

A-type graphs

Protein entities (source: ‘’, Reference Proteome filtered). This graph forms the core of BioGateway. The entities are identified by their UniParc IDs conditioned on the biological species, chromosome and encoding gene, e.g. ‘’, thus the corresponding classes are homogeneous with respect to the amino acid sequences. Together with protein classes there are collections of all translation products encoded by a particular gene (essentially sets, but modelled as rdf:Bag due to RDF limitations), e.g. ‘’.

Gene entities (source: ‘’, Reference Proteome filtered). Semantically these entities are defined by the sets of translation products they encode and logistically by the preferred gene names (as used in ‘’) conditioned on the biological species and chromosome e.g. ‘’. The corresponding entities are not guaranteed to be homogeneous with respect to the nucleotide sequences and modeled as collections (rdf:Bag).

Taxonomic entities (source: ‘’) identified by external URIs, e.g. ‘’.

Ontology term entities (source: identified by external URIs, e.g. ‘’. 

Disease entities (source: ‘’) identified by external URIs, e.g. ‘’.

B-type graphs

All entities are modeled as subclasses of rdf:Statement with instances conditioned on the source. 

Interactions between proteins and biological processes, cellular components, molecular functions (source: ‘’).


‘!chr-17!TP53!UPI000002ED67–GO_0000122’ (class)

‘!chr-17!TP53!UPI000002ED67–GO_0000122#goa’ (instance).

Defining properties:
‘’ “involved in” (biological process),
‘’ “part of” (cellular component),
‘’ “enables” (molecular function).

Protein-phenotype interactions (currently limited to diseases, source: ‘’),


‘!chr-17!TP53!UPI000002ED67–151623’ (class),

‘!chr-17!TP53!UPI000002ED67–151623#uniprot’ (instance)

Defining property: ‘’ “involved in” (disease). 

Protein-protein interactions (source: ‘’)


‘!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67’ (class),

‘!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67#intact’ (instance).

Defining property: ‘’ “molecularly interacts with” (protein). 

Interactions between transcription factors and target genes










‘!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2’ (class),

‘!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2#tfacts’ (instance).

Defining property: ‘’ “involved in regulation of” (gene).

External parental classes ‘unordered collection’ ‘triple’ ‘entity type’ ‘edge type’ ‘gene’ ‘protein’

Properties used in A and B graphs

Object properties ‘is subclass of’ ‘is subproperty of’ ‘has source’ # domain: ‘has evidence’ # range: publications ‘has evidence origin’ # range: source of metadata

Annotation properties ‘has name’ ‘has evidence level’

Properties used in A graphs

Object properties ‘is member of’ ‘inheres in’ # range: biological species ‘has close match’ # range: external URIs for genes and proteins

Annotation properties ‘has synonym’ ‘has definition’

Properties used in B graphs

Object properties ‘is instance of’ ‘involved in’ # range: biological process, disease ‘part of’ # range: cellular component ‘enables’ # range: molecular function ‘molecularly interacts with’ # range: protein ‘involved in regulation of’ # range: gene ‘is defined by’ # range: method

Annotation properties ‘has value’ # positive/negative ‘has comment’ # amino acid change