Systems Biology

Part II: Meta-Networks

Prof. Patrick E. Meyer

Exploiting Networks (part II)

Examples of networks

Internet – Roads (Rome) – Metabolome – Airplanes

Degree distribution

Preferential‑attachment models tend to have “Power‑law tail” distribution.

Graph motifs

  > graph.motifs(g, 4)
  [1]  NA  NA  NA  NA 296  NA 918 839 205 373  51
  > g<-graph.isocreate(4,0,directed=F)
  > plot(g)

Consequences of topology

Susceptible to attack
(1 % of the nodes disconnect the network)
Fragility is not bad
(“vaccination” of the hub protects the network)
In biological networks: can we mutate this gene (node)?
Idea of vertex importance (also called centrality)?

Closeness Centrality

Average distance between a node and all the others.

\[ c_i = \frac{1}{n-1}\sum_{j\neq i} d_{ij} \]

where \(d_{ij}\) is the shortest‑path distance in the graph.

> closeness(g)

Betweenness centrality

Captures how often a node lies on the optimal route between any pair of nodes.

\[ B_k = \sum_{i,j} \frac{g_{ij|k}}{g_{ij}} \]

\(g_{ij}\) total number of shortest paths between (i) and (j)
\(g_{ij|k}\) number of those shortest paths that pass through vertex (k)

> betweenness(g)

Eigen-Vector Centrality

A vertex’s importance depends on the importance of its neighbours.

\[ x_v = \frac{1}{\lambda}\sum_{t\in N(v)} x_t \qquad x_v = \frac{1}{\lambda}\sum_{t\in G} A_{vt},x_t \qquad A,x = \lambda,x \]

The eigenvector associated with the largest eigenvalue (\(\lambda\)) yields the centrality scores.

> evcent(g)

evcent: PageRank algorithm

Is our gene targetable?

Z-score of each measure (closeness, betweenness, evcent) \[Z_i = \frac{x_i - \mu}{\sigma}\]
If one of them >=2, prefer another target
Upstream or downstream?

Subgraph

Induced subgraph from a pre‑specified set of vertices – all edges among that set.

> g <- graph.formula(A-+B, A+-C, B-+C, B+-D, B+-E, D-+E)
> h <- induced_subgraph(g, c(2,4,5))
> E(h)
Edge sequence:
[1] D->B D->E E->B
> n <- neighbors(g,"D",mode ="out") #other strategy
[1] B E
> h <- induced_subgraph(g, c("D",names(n)))

Inferring Networks (part II)

Drosophila melanogaster data

Publicly available data
- list of 700+ known TFs
- list of 14 k+ genes
- potential edges: \(700^+ X 14000^+ = 10M^+\)
- Experimentally verified edges: 200 (REDfly data)
Expression data
- 2 “big” micro‑array datasets (FlyAtlas + GSE6186)
- 2 RNAseq modENCODE datasets
- CLR/MRNET/ARACNE for Transcriptional network?

ChIP‑binding experiments

Experiments for 76 TFs in D. melanogaster (full genome)

cond.	tf	chrom.	peakStart	peakEnd	intensity
t1	CG1674	chr2L	1	5954	0.9
…	…	…	…	…	…

but lots of non‑functional binding

ChIP-binding improvement

threshold on intensity: 0.5
threshold on location: within ± 500 bp of txStart.

Gene annotation file from flybase.org:

name chrom txStart txEnd cdsStart cdsEnd

CG1678 chr4 251355 266500 252579 266389

… … … … … …

name	chrom	txStart	txEnd	cdsStart	cdsEnd
CG1678	chr4	251355	266500	252579	266389
…	…	…	…	…	…

ChIP network

For all TF → TG pairs, an edge weight is defined as

0 if no binding evidence (above thresholds)
0.1 if no data (“I don’t know value”)
1 if binding in at least one experiment

tf	tg	w
X₁	X₂	0.1
Xᵢ	Xₖ	0
…	…	…
X_#tf	X_#tg	1

Binding motifs network

12 DNA sequences (from flybase.org)
139 known TF‑binding motifs
Network from search (GREP) binding motif
Problem: too many non‑functional binding motifs
Threshold on location: ± 500 bp of txStart.

Binding motifs improvement

Branch Length Score (BLS) (Kheradpour et al., 2007):

tf	tg	w
X₁	X₂	0.1
Xᵢ	Xₖ	0
…	…	…
X_#tf	X_#tg	0.83

A binding motif conserved through evolution is more likely to be functional.

Chromatin similarity profiles

Ts: H3K4me1, H3K4me3, H3K9me3, H3K27me3, H3K27ac, H3K9ac. (modENCODE dataset)
Ct: H3K4me2, H4K16ac, H3K36me1, H3K36me3, H3K79me1, H3K79me2, H3K23ac, H3K18ac, H4K12ac, H4K5ac, H2BK5ac, H4K8ac. (modENCODE dataset)

co-chromatin networks

similarity between profiles (correlations)

gene	M	A	R	K	1	M	A	R	K	2	…
tf	1	1	0	0	0	0	1	1	1	0	…
tg	1	0	0	0	0	0	1	1	1	1	…

Squared Spearman correlations provided
- 2 co-chromatin nets
- 4 co-expression nets

Meta‑networks Principle

\[G_{1}\;\begin{array}{c}\swarrow\\\searrow\end{array}\; \begin{array}{c}G_{2}\\\uparrow\!\downarrow\\G_{3}\end{array}\]

Networks from TF binding (motif and ChIP)
- pros: physical connections (directed)
- issue: elimination of non‑functional bindings
Networks from correlations (expr and chromatin)
- pros: functional connections (but undirected)
- issue: elimination of indirect interactions

Weight-average

motif

tf tg \(w_{ij}\)

\(X_{1}\) \(X_{2}\) 0.1

\(X_{1}\) \(X_{3}\) 0.3

… … …

\(X_{\#tf}\) \(X_{\#tg}\) 0.83

tf	tg	\(w_{ij}\)
\(X_{1}\)	\(X_{2}\)	0.1
\(X_{1}\)	\(X_{3}\)	0.3
…	…	…
\(X_{\#tf}\)	\(X_{\#tg}\)	0.83

correlation

tf tg \(w_{ij}\)

\(X_{1}\) \(X_{2}\) 0.3

\(X_{1}\) \(X_{3}\) 0.1

… … …

\(X_{\#tf}\) \(X_{\#tg}\) 0.95

tf	tg	\(w_{ij}\)
\(X_{1}\)	\(X_{2}\)	0.3
\(X_{1}\)	\(X_{3}\)	0.1
…	…	…
\(X_{\#tf}\)	\(X_{\#tg}\)	0.95

weight-average

tf tg \(w_{ij}\)

\(X_{1}\) \(X_{2}\) 0.2

\(X_{1}\) \(X_{3}\) 0.2

… … …

\(X_{\#tf}\) \(X_{\#tg}\) 0.89

tf	tg	\(w_{ij}\)
\(X_{1}\)	\(X_{2}\)	0.2
\(X_{1}\)	\(X_{3}\)	0.2
…	…	…
\(X_{\#tf}\)	\(X_{\#tg}\)	0.89

Meta-network made from the weight sum (or weight average) of a set of sub-networks.

Rank-sum

motif-ranked

tf tg \(w_{ij}\)

\(X_{1}\) \(X_{2}\) 92

\(X_{1}\) \(X_{3}\) 51

… … …

\(X_{\#tf}\) \(X_{\#tg}\) 1

tf	tg	\(w_{ij}\)
\(X_{1}\)	\(X_{2}\)	92
\(X_{1}\)	\(X_{3}\)	51
…	…	…
\(X_{\#tf}\)	\(X_{\#tg}\)	1

cor-ranked

tf tg \(w_{ij}\)

\(X_{1}\) \(X_{2}\) 47

\(X_{1}\) \(X_{3}\) 360

… … …

\(X_{\#tf}\) \(X_{\#tg}\) 20

tf	tg	\(w_{ij}\)
\(X_{1}\)	\(X_{2}\)	47
\(X_{1}\)	\(X_{3}\)	360
…	…	…
\(X_{\#tf}\)	\(X_{\#tg}\)	20

rank-sum

tf tg \(w_{ij}\)

\(X_{1}\) \(X_{2}\) 139

\(X_{1}\) \(X_{3}\) 411

… … …

\(X_{\#tf}\) \(X_{\#tg}\) 21

tf	tg	\(w_{ij}\)
\(X_{1}\)	\(X_{2}\)	139
\(X_{1}\)	\(X_{3}\)	411
…	…	…
\(X_{\#tf}\)	\(X_{\#tg}\)	21

Meta-network made from the sum of ranks of a set of sub-networks (co-expression, co-chromatin, ChIP and motif)

Validating Networks (part II)

PR-curves

Topology

Similar to E.coli and S.cerevisae

Topology II

Similar to E.coli and S.cerevisae

Network randomization

Random network made from swapping names of genes
Fold-enrichment: average co-expression in co-regulated genes in our network vs in the randomized network.
Motif: 1.08; ChIP: 1.46; Meta: 3.07

Overfitting problem

NEVER use the same data for modeling and validating
3 expression datasets for modeling vs 1 for validation

GO-term similarity

List of GO functional terms for each gene
Similarity between lists: Jaccard index

\[JI=\frac{L_{1}\bigcap L_{2}}{L_{1}\bigcup L_{2}}\]

PPI

→ Link if two proteins bind

Results on co-regulated genes

Fold-enrichment in coexpression, GO-terms and PPI in co-regulated genes of our networks vs in a randomized version of them.

network	PPI	GO	RNAseq
motif	1.39	1.06	1.08
ChIP	1.24	1.23	1.46
unsupervised	1.53	1.44	3.07
supervised	1.58	1.55	3.62

Meta-net exploitation

[modENCODE consortium, Science 2010]

Prediction of GO process terms of unannotated genes (community detection).
Predictive models of expression of target genes from expression of regulators.
Hub detection