BioGateway Data Model

Overview

The BioGateway triple store provides a unified protein-centric view on biological networks. The data in BioGateway are  modeled as directed multi-graphs, not necessarily acyclic, which is a natural choice for representing complex networks.

There are two types of graphs in BioGateway:

A – those that define entities, e.g. proteins, genes, etc.,

B – those that define relations among entities, e.g. protein-protein interaction, protein-disease interactions, etc.

There are three types of nodes in BioGateway:
  • Classes: entities in the domain of discourse, e.g. proteins, diseases, etc. (URIs)
  • Instances: particular interpretations/views of entities conditioned on the source (URIs, only B type graphs)
  • Attributes: qualities, quantities, etc. (literals)

Nodes are connected through multiple types of edges, a.k.a. properties, semantically defined in external ontologies/taxonomies/vocabularies (URIs). Within any given graph a particular property is used within one unique semantic context.

The atomic unit of information (elementary graph) comprises a pair of nodes (subject and object) connected by a directed edge (predicate), commonly known as a triple.

A-type graphs

below follows a brief description of these graphs

http://rdf.biogateway.eu/graph/prot

Protein entities (source: ‘http://uniprot.org/uniprot/’, Reference Proteome filtered). This graph forms the core of BioGateway. The entities are identified by their UniParc IDs conditioned on the biological species, chromosome and encoding gene, e.g. ‘http://rdf.biogateway.eu/prot/9606/chr-17/TP53/UPI000002ED67’, thus the corresponding classes are homogeneous with respect to the amino acid sequences. Together with protein classes there are collections of all translation products encoded by a particular gene (essentially sets, but modelled as rdf:Bag due to RDF limitations), e.g. ‘http://rdf.biogateway.eu/prot/9606/chr-17/TP53/’.

http://rdf.biogateway.eu/graph/taxon

Taxonomic entities (source: ‘http://purl.bioontology.org/ontology/NCBITAXON’) identified by external URIs, e.g. ‘http://purl.bioontology.org/ontology/NCBITAXON/9606’.

http://rdf.biogateway.eu/graph/omim 

Disease entities (source: ‘http://purl.bioontology.org/ontology/OMIM’) identified by external URIs, e.g. ‘http://purl.obolibrary.org/OMIM/151623’.

http://rdf.biogateway.eu/graph/gene

Gene entities (source: ‘http://uniprot.org/uniprot/’, Reference Proteome filtered). Semantically these entities are defined by the sets of translation products they encode and logistically by the preferred gene names (as used in ‘http://uniprot.org/uniprot/’) conditioned on the biological species and chromosome e.g. ‘http://rdf.biogateway.eu/gene/9606/chr-17/TP53/’. The corresponding entities are not guaranteed to be homogeneous with respect to the nucleotide sequences and modeled as collections (rdf:Bag).

http://rdf.biogateway.eu/graph/go

Ontology term entities (source: https://bioportal.bioontology.org/ontologies/GO) identified by external URIs, e.g. ‘http://purl.obolibrary.org/obo/GO_0000122’.

 

B-type graphs

below follows a brief description of these graphs

All entities are modeled as subclasses of rdf:Statement with instances conditioned on the source.

http://rdf.biogateway.eu/graph/prot2onto – interactions between proteins and biological processes, cellular components, molecular functions (source: ‘http://identifiers.org/goa’).

e.g:

‘http://rdf.biogateway.eu/prot-obo/9606!chr-17!TP53!UPI000002ED67–GO_0000122’ (class)

‘http://rdf.biogateway.eu/prot-obo/9606!chr-17!TP53!UPI000002ED67–GO_0000122#goa’ (instance).

Defining properties:
‘http://purl.obolibrary.org/obo/RO_0002331’ “involved in” (biological process),
‘http://purl.obolibrary.org/obo/BFO_0000050’ “part of” (cellular component),
‘http://purl.obolibrary.org/obo/RO_0002327’ “enables” (molecular function).

 

http://rdf.biogateway.eu/graph/prot2phen – protein-phenotype interactions (currently limited to diseases, source: ‘http://uniprot.org/uniprot/’),

e.g:

‘http://rdf.biogateway.eu/prot-omim/9606!chr-17!TP53!UPI000002ED67–151623’ (class),

‘http://rdf.biogateway.eu/prot-omim/9606!chr-17!TP53!UPI000002ED67–151623#uniprot’ (instance)

Defining property: ‘http://purl.obolibrary.org/obo/RO_0002331’ “involved in” (disease).

 

http://rdf.biogateway.eu/graph/prot2prot – protein-protein interactions (source: ‘http://identifiers.org/intact/’)

e.g:

‘http://rdf.biogateway.eu/prot-prot/9606!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67’ (class),

‘http://rdf.biogateway.eu/prot-prot/9606!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67#intact’ (instance).

Defining property: ‘http://purl.obolibrary.org/obo/RO_0002436’ “molecularly interacts with” (protein).

 

http://rdf.biogateway.eu/graph/tfac2gene – interactions between transcription factors and target genes

sources:

‘http://www.tfacts.org’,

‘http://www.grnpedia.org/trrust/’,

‘http://www.lbbc.ibb.unesp.br/htri’,

‘http://signor.uniroma2.it’,

‘http://identifiers.org/intact/’,

‘http://identifiers.org/goa’,

‘http://www.extri.org’,

e.g:

‘http://rdf.biogateway.eu/prot-gene/9606!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2’ (class),

‘http://rdf.biogateway.eu/prot-gene/9606!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2#tfacts’ (instance).

Defining property: ‘http://purl.obolibrary.org/obo/RO_0002428’ “involved in regulation of” (gene).

 

External parental classes

http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag ‘unordered collection’
http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement ‘triple’
http://www.w3.org/2000/01/rdf-schema#Class ‘entity type’
http://www.w3.org/2000/01/rdf-schema#Property ‘edge type’
http://semanticscience.org/resource/SIO_010035 ‘gene’
http://semanticscience.org/resource/SIO_010043 ‘protein’

 

Properties used in A and B graphs

Object properties

http://www.w3.org/2000/01/rdf-schema#subClassOf ‘is subclass of’
http://www.w3.org/2000/01/rdf-schema#subPropertyOf ‘is subproperty of’
http://semanticscience.org/resource/SIO_000253 ‘has source’ # domain: rdf.bigateway.eu/graph/
http://semanticscience.org/resource/SIO_000772 ‘has evidence’ # range: publications
http://schema.org/evidenceOrigin ‘has evidence origin’ # range: source of metadata

 

Annotation properties

http://www.w3.org/2004/02/skos/core#prefLabel ‘has name’
http://schema.org/evidenceLevel ‘has evidence level’

Properties used in A graphs

Object properties

http://schema.org/memberOf ‘is member of’
http://purl.obolibrary.org/obo/BFO_0000052 ‘inheres in’ # range: biological species
http://www.w3.org/2004/02/skos/core#closeMatch ‘has close match’ # range: external URIs for genes and proteins


Annotation properties

http://www.w3.org/2004/02/skos/core#altLabel ‘has synonym’
http://www.w3.org/2004/02/skos/core#definition ‘has definition’

 

Properties used in B graphs

Object properties

http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‘is instance of’
http://purl.obolibrary.org/obo/RO_0002331 ‘involved in’ # range: biological process, disease
http://purl.obolibrary.org/obo/BFO_0000050 ‘part of’ # range: cellular component
http://purl.obolibrary.org/obo/RO_0002327 ‘enables’ # range: molecular function
http://purl.obolibrary.org/obo/RO_0002436 ‘molecularly interacts with’ # range: protein
http://purl.obolibrary.org/obo/RO_0002428 ‘involved in regulation of’ # range: gene
http://www.w3.org/2000/01/rdf-schema#isDefinedBy ‘is defined by’ # range: method


Annotation properties

http://www.w3.org/1999/02/22-rdf-syntax-ns#value ‘has value’ # positive/negative
http://www.w3.org/2000/01/rdf-schema#comment ‘has comment’ # amino acid change