Introduction to Biological Data Analysis


Exercises 5: Introduction to statistics in R

Prof. Patrick Meyer

BioSys Lab - Université de Liège

binom

  1. Make a dataset of 20 observations on two independent binomial random variables, both having a probability of success of 50%.
  2. Redo the previous exercise but with 5000 observations and compare (and comment) the different probabilities obtained (on those two datasets) using the table() function.

norm

  1. Make a dataset of 200 observations with two independent normally distributed variables: $N(\mu = 10, \sigma = 10)$ and $N(\mu = 1, \sigma = 5)$.
  2. Using the quantile function, identify the observations that are, in the upper quartile of the first variable while in the lower tercile of the second variable.
  3. Redo that exercise using the appropriate norm function instead of the quantile. Explain the differences observed between those two strategies, (norm) and (quantile).

hyper

Researchers built a computational model that can predict if a patient is sick or not based on picture. Installed in a hospital, the model predicted 30 people sick out of 100, but it turns out that only 25 of those predictions have been correct. Given that, on those 100 people, 50 were actually identified by doctor as sick...
  1. Do you think the model is good? What would be the probability to be as good as this model by chance?

cor

  1. Given X a normally distributed random variable $N(\mu=-3, \sigma=3)$, Y a normally distributed random variable $N(\mu=0, \sigma=1)$, V = exp(X), Z = sin(X) et W = X+Y, all of those variables having 1000 observations:
  2. Compare Pearson, Kendall and Spearman 's correlation in between all those variables.
  3. Explain using scatterplots the correlations obtained.