BioGateway Data Model
Overview
The BioGateway triple store provides a unified protein-centric view on biological networks. The data in BioGateway are modeled as directed multi-graphs, not necessarily acyclic, which is a natural choice for representing complex networks.
There are two types of graphs in BioGateway:
A – those that define entities, e.g. proteins, genes, etc.,
B – those that define relations among entities, e.g. protein-protein interaction, protein-disease interactions, etc.
There are three types of nodes in BioGateway:
- Classes: entities in the domain of discourse, e.g. proteins, diseases, etc. (URIs)
- Instances: particular interpretations/views of entities conditioned on the source (URIs, only B type graphs)
- Attributes: qualities, quantities, etc. (literals)
Nodes are connected through multiple types of edges, a.k.a. properties, semantically defined in external ontologies/taxonomies/vocabularies (URIs). Within any given graph a particular property is used within one unique semantic context.
The atomic unit of information (elementary graph) comprises a pair of nodes (subject and object) connected by a directed edge (predicate), commonly known as a triple.
A-type graphs
below follows a brief description of these graphs
http://rdf.biogateway.eu/graph/prot
Protein entities (source: ‘http://uniprot.org/uniprot/’, Reference Proteome filtered). This graph forms the core of BioGateway. The entities are identified by their UniParc IDs conditioned on the biological species, chromosome and encoding gene, e.g. ‘http://rdf.biogateway.eu/prot/9606/chr-17/TP53/UPI000002ED67’, thus the corresponding classes are homogeneous with respect to the amino acid sequences. Together with protein classes there are collections of all translation products encoded by a particular gene (essentially sets, but modelled as rdf:Bag due to RDF limitations), e.g. ‘http://rdf.biogateway.eu/prot/9606/chr-17/TP53/’.
http://rdf.biogateway.eu/graph/taxon
Taxonomic entities (source: ‘http://purl.bioontology.org/ontology/NCBITAXON’) identified by external URIs, e.g. ‘http://purl.bioontology.org/ontology/NCBITAXON/9606’.
http://rdf.biogateway.eu/graph/omim
Disease entities (source: ‘http://purl.bioontology.org/ontology/OMIM’) identified by external URIs, e.g. ‘http://purl.obolibrary.org/OMIM/151623’.
http://rdf.biogateway.eu/graph/gene
Gene entities (source: ‘http://uniprot.org/uniprot/’, Reference Proteome filtered). Semantically these entities are defined by the sets of translation products they encode and logistically by the preferred gene names (as used in ‘http://uniprot.org/uniprot/’) conditioned on the biological species and chromosome e.g. ‘http://rdf.biogateway.eu/gene/9606/chr-17/TP53/’. The corresponding entities are not guaranteed to be homogeneous with respect to the nucleotide sequences and modeled as collections (rdf:Bag).
http://rdf.biogateway.eu/graph/go
Ontology term entities (source: https://bioportal.bioontology.org/ontologies/GO) identified by external URIs, e.g. ‘http://purl.obolibrary.org/obo/GO_0000122’.
B-type graphs
below follows a brief description of these graphs
All entities are modeled as subclasses of rdf:Statement with instances conditioned on the source.
http://rdf.biogateway.eu/graph/prot2onto – interactions between proteins and biological processes, cellular components, molecular functions (source: ‘http://identifiers.org/goa’).
e.g:
‘http://rdf.biogateway.eu/prot-obo/9606!chr-17!TP53!UPI000002ED67–GO_0000122’ (class)
‘http://rdf.biogateway.eu/prot-obo/9606!chr-17!TP53!UPI000002ED67–GO_0000122#goa’ (instance).
Defining properties:
‘http://purl.obolibrary.org/obo/RO_0002331’ “involved in” (biological process),
‘http://purl.obolibrary.org/obo/BFO_0000050’ “part of” (cellular component),
‘http://purl.obolibrary.org/obo/RO_0002327’ “enables” (molecular function).
http://rdf.biogateway.eu/graph/prot2phen – protein-phenotype interactions (currently limited to diseases, source: ‘http://uniprot.org/uniprot/’),
e.g:
‘http://rdf.biogateway.eu/prot-omim/9606!chr-17!TP53!UPI000002ED67–151623’ (class),
‘http://rdf.biogateway.eu/prot-omim/9606!chr-17!TP53!UPI000002ED67–151623#uniprot’ (instance)
Defining property: ‘http://purl.obolibrary.org/obo/RO_0002331’ “involved in” (disease).
http://rdf.biogateway.eu/graph/prot2prot – protein-protein interactions (source: ‘http://identifiers.org/intact/’)
e.g:
‘http://rdf.biogateway.eu/prot-prot/9606!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67’ (class),
‘http://rdf.biogateway.eu/prot-prot/9606!chr-03!BHLHE40!UPI0000126923–9606!chr-17!TP53!UPI000002ED67#intact’ (instance).
Defining property: ‘http://purl.obolibrary.org/obo/RO_0002436’ “molecularly interacts with” (protein).
http://rdf.biogateway.eu/graph/tfac2gene – interactions between transcription factors and target genes
sources:
‘http://www.tfacts.org’,
‘http://www.grnpedia.org/trrust/’,
‘http://www.lbbc.ibb.unesp.br/htri’,
‘http://signor.uniroma2.it’,
‘http://identifiers.org/intact/’,
‘http://identifiers.org/goa’,
‘http://www.extri.org’,
e.g:
‘http://rdf.biogateway.eu/prot-gene/9606!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2’ (class),
‘http://rdf.biogateway.eu/prot-gene/9606!chr-17!TP53!UPI000002ED67–9606!chr-20!AAR2#tfacts’ (instance).
Defining property: ‘http://purl.obolibrary.org/obo/RO_0002428’ “involved in regulation of” (gene).
External parental classes
http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag ‘unordered collection’
http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement ‘triple’
http://www.w3.org/2000/01/rdf-schema#Class ‘entity type’
http://www.w3.org/2000/01/rdf-schema#Property ‘edge type’
http://semanticscience.org/resource/SIO_010035 ‘gene’
http://semanticscience.org/resource/SIO_010043 ‘protein’
Properties used in A and B graphs
Object properties
http://www.w3.org/2000/01/rdf-schema#subClassOf ‘is subclass of’
http://www.w3.org/2000/01/rdf-schema#subPropertyOf ‘is subproperty of’
http://semanticscience.org/resource/SIO_000253 ‘has source’ # domain: rdf.bigateway.eu/graph/
http://semanticscience.org/resource/SIO_000772 ‘has evidence’ # range: publications
http://schema.org/evidenceOrigin ‘has evidence origin’ # range: source of metadata
Annotation properties
http://www.w3.org/2004/02/skos/core#prefLabel ‘has name’
http://schema.org/evidenceLevel ‘has evidence level’
Properties used in A graphs
Object properties
http://schema.org/memberOf ‘is member of’
http://purl.obolibrary.org/obo/BFO_0000052 ‘inheres in’ # range: biological species
http://www.w3.org/2004/02/skos/core#closeMatch ‘has close match’ # range: external URIs for genes and proteins
Annotation properties
http://www.w3.org/2004/02/skos/core#altLabel ‘has synonym’
http://www.w3.org/2004/02/skos/core#definition ‘has definition’
Properties used in B graphs
Object properties
http://www.w3.org/1999/02/22-rdf-syntax-ns#type ‘is instance of’
http://purl.obolibrary.org/obo/RO_0002331 ‘involved in’ # range: biological process, disease
http://purl.obolibrary.org/obo/BFO_0000050 ‘part of’ # range: cellular component
http://purl.obolibrary.org/obo/RO_0002327 ‘enables’ # range: molecular function
http://purl.obolibrary.org/obo/RO_0002436 ‘molecularly interacts with’ # range: protein
http://purl.obolibrary.org/obo/RO_0002428 ‘involved in regulation of’ # range: gene
http://www.w3.org/2000/01/rdf-schema#isDefinedBy ‘is defined by’ # range: method
Annotation properties
http://www.w3.org/1999/02/22-rdf-syntax-ns#value ‘has value’ # positive/negative
http://www.w3.org/2000/01/rdf-schema#comment ‘has comment’ # amino acid change