BioGateway Data Model


The BioGateway triple store provides a unified protein-centric view on biological networks. The data in BioGateway are  modeled as directed multi-graphs, not necessarily acyclic, which is a natural choice for representing complex networks.

There are two types of graphs in BioGateway:

A – those that define entities, e.g. proteins, genes, etc.,

B – those that define relations among entities, e.g. protein-protein interaction, protein-disease interactions, etc.

There are three types of nodes in BioGateway:
  • Classes: entities in the domain of discourse, e.g. proteins, diseases, etc. (URIs)
  • Instances: particular interpretations/views of entities conditioned on the source (URIs, only B type graphs)
  • Attributes: qualities, quantities, etc. (literals)

Nodes are connected through multiple types of edges, a.k.a. properties, semantically defined in external ontologies/taxonomies/vocabularies (URIs). Within any given graph a particular property is used within one unique semantic context.

The atomic unit of information (elementary graph) comprises a pair of nodes (subject and object) connected by a directed edge (predicate), commonly known as a triple.

A-type graphs

below follows a brief description of these graphs

Protein entities (source: ‘’, Reference Proteome filtered). This graph forms the core of BioGateway. The entities are identified by their UniParc IDs conditioned on the biological species, chromosome and encoding gene, e.g. ‘’, thus the corresponding classes are homogeneous with respect to the amino acid sequences. Together with protein classes there are collections of all translation products encoded by a particular gene (essentially sets, but modelled as rdf:Bag due to RDF limitations), e.g. ‘’.

Taxonomic entities (source: ‘’) identified by external URIs, e.g. ‘’. 

Disease entities (source: ‘’) identified by external URIs, e.g. ‘’.

Gene entities (source: ‘’, Reference Proteome filtered). Semantically these entities are defined by the sets of translation products they encode and logistically by the preferred gene names (as used in ‘’) conditioned on the biological species and chromosome e.g. ‘’. The corresponding entities are not guaranteed to be homogeneous with respect to the nucleotide sequences and modeled as collections (rdf:Bag).

Ontology term entities (source: identified by external URIs, e.g. ‘’.


B-type graphs

below follows a brief description of these graphs

All entities are modeled as subclasses of rdf:Statement with instances conditioned on the source. – interactions between proteins and biological processes, cellular components, molecular functions (source: ‘’).


‘!chr-17!TP53!UPI000002ED67–GO_0000122’ (class)

‘!chr-17!TP53!UPI000002ED67–GO_0000122#goa’ (instance).

Defining properties:
‘’ “involved in” (biological process),
‘’ “part of” (cellular component),
‘’ “enables” (molecular function). – protein-phenotype interactions (currently limited to diseases, source: ‘’),


‘!chr-17!TP53!UPI000002ED67–151623’ (class),

‘!chr-17!TP53!UPI000002ED67–151623#uniprot’ (instance)

Defining property: ‘’ “involved in” (disease). – protein-protein interactions (source: ‘’)


‘!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67’ (class),

‘!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67#intact’ (instance).

Defining property: ‘’ “molecularly interacts with” (protein). – interactions between transcription factors and target genes










‘!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2’ (class),

‘!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2#tfacts’ (instance).

Defining property: ‘’ “involved in regulation of” (gene).


External parental classes ‘unordered collection’ ‘triple’ ‘entity type’ ‘edge type’ ‘gene’ ‘protein’


Properties used in A and B graphs

Object properties ‘is subclass of’ ‘is subproperty of’ ‘has source’ # domain: ‘has evidence’ # range: publications ‘has evidence origin’ # range: source of metadata


Annotation properties ‘has name’ ‘has evidence level’

Properties used in A graphs

Object properties ‘is member of’ ‘inheres in’ # range: biological species ‘has close match’ # range: external URIs for genes and proteins

Annotation properties ‘has synonym’ ‘has definition’


Properties used in B graphs

Object properties ‘is instance of’ ‘involved in’ # range: biological process, disease ‘part of’ # range: cellular component ‘enables’ # range: molecular function ‘molecularly interacts with’ # range: protein ‘involved in regulation of’ # range: gene ‘is defined by’ # range: method

Annotation properties ‘has value’ # positive/negative ‘has comment’ # amino acid change