Introduction to Biological Data Analysis

TP 4: Introduction to probabilities

Prof. Patrick Meyer

BioSys Lab - Université de Liège

Data.frame

Using functions such as data(), read.table(), NROW(), order(),
write.table(), rm(), ls(), data.frame(), colnames()

In a variable data1, create a data.frame made of
1. the data.frame CO2, but without variables $Type and $Treatment.
2. an additional column id, a number going from 1 to the number of rows in the data, and denoting the id of the observation/line/experiment.
3. with the uptake variable reordered in increasing order (the rest of the data.frame should also be reordered accordingly).
In a data2, a data.frame made of
1. the data.frame CO2, but without variables $conc and $uptake.
2. an additional column id, similarly than for data1.
3. with new names of variables: $id becomes $number, $Type becomes $origin and $Treatment becomes $change.
Write data1 in a text file data1.txt whose values are comma-separated and data2 in data2.txt with space-separated values.
Clean Rstudio environment
Imagine that 2 of your collaborators working on the same set of plants, but being in two different labs, have given different names of variables, different order of lines and even different values-separator in their files, as the data analyst you try to assemble their work in one integrated data.frame by loading data1.txt and data2.txt...
Merge data1 and data2 using the merge function (be careful with the parameters by.x et by.y), and try to have one data.frame (it should be similar to the original CO2).

table

What would be displayed if you type

  > cancer<-c("oui","oui","non","oui","non","non","oui","non")
  > fumeur<-c("oui","non","non","oui","oui","non","oui","non")
  > z <- data.frame(cancer,fumeur)
  > table(z)

What is the empirical probability to smoke and be healthy? (and what is the R command to do it?)
What is the empirical probability to have a cancer given one smokes? (and what is the R command to do it?)
What is the empirical probability to smoke? (and what is the R command to do it?)

sample

Produce a data.frame with 200 observations and three independent random variables: "light","temp","pressure" that can each take three values with equal probabilities (of 1/3): "low", "mid" and "high".
Use table() in order to identify the frequency at which the three variables are at values "high".
Start again this exercise using set.seed(100).

Bayes

(This exercise do not require the use of R and can be performed on paper) Given a test that would be 99% accurate to check one has a genetic mutation (that only 1% of the population have),
using Bayes'theorem, check the probability that given a positive test, one really has the mutation.

Introduction to Biological Data Analysis TP 4: Introduction to probabilities

Data.frame

table

sample

Bayes

Introduction to Biological Data Analysis

TP 4: Introduction to probabilities