Systems Biology

Part II: Meta-Networks

Prof. Patrick E. Meyer

Exploiting Networks (part II)

Examples of networks

Internet – Roads (Rome) – Metabolome – Airplanes

Degree distribution

Preferential‑attachment models tend to have “Power‑law tail” distribution.

Graph motifs

  > graph.motifs(g, 4)
  [1]  NA  NA  NA  NA 296  NA 918 839 205 373  51
  > g<-graph.isocreate(4,0,directed=F)
  > plot(g)

Consequences of topology

  • Susceptible to attack
    (1 % of the nodes disconnect the network)

  • Fragility is not bad
    (“vaccination” of the hub protects the network)

  • In biological networks: can we mutate this gene (node)?

  • Idea of vertex importance (also called centrality)?

Closeness Centrality

Average distance between a node and all the others.

\[ c_i = \frac{1}{n-1}\sum_{j\neq i} d_{ij} \]

where \(d_{ij}\) is the shortest‑path distance in the graph.

> closeness(g)

Betweenness centrality

Captures how often a node lies on the optimal route between any pair of nodes.

\[ B_k = \sum_{i,j} \frac{g_{ij|k}}{g_{ij}} \]

  • \(g_{ij}\) total number of shortest paths between (i) and (j)
  • \(g_{ij|k}\) number of those shortest paths that pass through vertex (k)
> betweenness(g)

Eigen-Vector Centrality

A vertex’s importance depends on the importance of its neighbours.

\[ x_v = \frac{1}{\lambda}\sum_{t\in N(v)} x_t \qquad x_v = \frac{1}{\lambda}\sum_{t\in G} A_{vt},x_t \qquad A,x = \lambda,x \]

The eigenvector associated with the largest eigenvalue (\(\lambda\)) yields the centrality scores.

> evcent(g)

evcent: PageRank algorithm

Is our gene targetable?

  • Z-score of each measure (closeness, betweenness, evcent) \[Z_i = \frac{x_i - \mu}{\sigma}\]

  • If one of them >=2, prefer another target

  • Upstream or downstream?

Subgraph

Induced subgraph from a pre‑specified set of vertices – all edges among that set.

> g <- graph.formula(A-+B, A+-C, B-+C, B+-D, B+-E, D-+E)
> h <- induced_subgraph(g, c(2,4,5))
> E(h)
Edge sequence:
[1] D->B D->E E->B
> n <- neighbors(g,"D",mode ="out") #other strategy
[1] B E
> h <- induced_subgraph(g, c("D",names(n)))

Inferring Networks (part II)

Drosophila melanogaster data

  • Publicly available data
    • list of 700+ known TFs
    • list of 14 k+ genes
    • potential edges: \(700^+ X 14000^+ = 10M^+\)
    • Experimentally verified edges: 200 (REDfly data)
  • Expression data
    • 2 “big” micro‑array datasets (FlyAtlas + GSE6186)
    • 2 RNAseq modENCODE datasets
    • CLR/MRNET/ARACNE for Transcriptional network?

ChIP‑binding experiments

Experiments for 76 TFs in D. melanogaster (full genome)

cond. tf chrom. peakStart peakEnd intensity
t1 CG1674 chr2L 1 5954 0.9

but lots of non‑functional binding

ChIP-binding improvement

  • threshold on intensity: 0.5

  • threshold on location: within ± 500 bp of txStart.

    Gene annotation file from flybase.org:

    name chrom txStart txEnd cdsStart cdsEnd
    CG1678 chr4 251355 266500 252579 266389

ChIP network

For all TF → TG pairs, an edge weight is defined as

  • 0 if no binding evidence (above thresholds)
  • 0.1 if no data (“I don’t know value”)
  • 1 if binding in at least one experiment
tf tg w
X X 0.1
X X 0
X#tf X#tg 1

Binding motifs network

  • 12 DNA sequences (from flybase.org)
  • 139 known TF‑binding motifs
  • Network from search (GREP) binding motif
  • Problem: too many non‑functional binding motifs
  • Threshold on location: ± 500 bp of txStart.

Binding motifs improvement

Branch Length Score (BLS) (Kheradpour et al., 2007):

tf tg w
X X 0.1
X X 0
X#tf X#tg 0.83

A binding motif conserved through evolution is more likely to be functional.

Chromatin similarity profiles

  • Ts: H3K4me1, H3K4me3, H3K9me3, H3K27me3, H3K27ac, H3K9ac. (modENCODE dataset)
  • Ct: H3K4me2, H4K16ac, H3K36me1, H3K36me3, H3K79me1, H3K79me2, H3K23ac, H3K18ac, H4K12ac, H4K5ac, H2BK5ac, H4K8ac. (modENCODE dataset)

co-chromatin networks

similarity between profiles (correlations)

gene M A R K 1 M A R K 2
tf 1 1 0 0 0 0 1 1 1 0
tg 1 0 0 0 0 0 1 1 1 1
  • Squared Spearman correlations provided
    • 2 co-chromatin nets
    • 4 co-expression nets

Meta‑networks Principle

\[G_{1}\;\begin{array}{c}\swarrow\\\searrow\end{array}\; \begin{array}{c}G_{2}\\\uparrow\!\downarrow\\G_{3}\end{array}\]

  • Networks from TF binding (motif and ChIP)
    • pros: physical connections (directed)
    • issue: elimination of non‑functional bindings
  • Networks from correlations (expr and chromatin)
    • pros: functional connections (but undirected)
    • issue: elimination of indirect interactions

Weight-average

  • motif

    tf tg \(w_{ij}\)
    \(X_{1}\) \(X_{2}\) 0.1
    \(X_{1}\) \(X_{3}\) 0.3
    \(X_{\#tf}\) \(X_{\#tg}\) 0.83
  • correlation

    tf tg \(w_{ij}\)
    \(X_{1}\) \(X_{2}\) 0.3
    \(X_{1}\) \(X_{3}\) 0.1
    \(X_{\#tf}\) \(X_{\#tg}\) 0.95
  • weight-average

    tf tg \(w_{ij}\)
    \(X_{1}\) \(X_{2}\) 0.2
    \(X_{1}\) \(X_{3}\) 0.2
    \(X_{\#tf}\) \(X_{\#tg}\) 0.89
  • Meta-network made from the weight sum (or weight average) of a set of sub-networks.

Rank-sum

  • motif-ranked

    tf tg \(w_{ij}\)
    \(X_{1}\) \(X_{2}\) 92
    \(X_{1}\) \(X_{3}\) 51
    \(X_{\#tf}\) \(X_{\#tg}\) 1
  • cor-ranked

    tf tg \(w_{ij}\)
    \(X_{1}\) \(X_{2}\) 47
    \(X_{1}\) \(X_{3}\) 360
    \(X_{\#tf}\) \(X_{\#tg}\) 20
  • rank-sum

    tf tg \(w_{ij}\)
    \(X_{1}\) \(X_{2}\) 139
    \(X_{1}\) \(X_{3}\) 411
    \(X_{\#tf}\) \(X_{\#tg}\) 21
  • Meta-network made from the sum of ranks of a set of sub-networks (co-expression, co-chromatin, ChIP and motif)

Validating Networks (part II)

PR-curves

Topology

Similar to E.coli and S.cerevisae

Topology II

Similar to E.coli and S.cerevisae

Network randomization

  • Random network made from swapping names of genes
  • Fold-enrichment: average co-expression in co-regulated genes in our network vs in the randomized network.
  • Motif: 1.08; ChIP: 1.46; Meta: 3.07

Overfitting problem

  • NEVER use the same data for modeling and validating
  • 3 expression datasets for modeling vs 1 for validation

GO-term similarity

  • List of GO functional terms for each gene

  • Similarity between lists: Jaccard index

    \[JI=\frac{L_{1}\bigcap L_{2}}{L_{1}\bigcup L_{2}}\]

PPI

Link if two proteins bind

Results on co-regulated genes

Fold-enrichment in coexpression, GO-terms and PPI in co-regulated genes of our networks vs in a randomized version of them.

network PPI GO RNAseq
motif 1.39 1.06 1.08
ChIP 1.24 1.23 1.46
unsupervised 1.53 1.44 3.07
supervised 1.58 1.55 3.62

Meta-net exploitation

[modENCODE consortium, Science 2010]

  • Prediction of GO process terms of unannotated genes (community detection).
  • Predictive models of expression of target genes from expression of regulators.
  • Hub detection