Biochemistry laboratories - 201
Jean-Yves Sgro -jsgro@wisc.edu
Find this document here (short URL) today: http://go.wisc.edu/km26g1
HTML Hand-outs are at:
Note: other formats and other tutorials are at https://biochem.wisc.edu/bcrf/tutorials
First download the helper command biocLite.R
and then install packages.
source("http://bioconductor.org/biocLite.R")
biocLite("GEOquery")
biocLite("limma")
biocLite("Biobase")
biocLite("affy")
*NOTE* If you were not here in previous Bioconductor session(s) you may have ot add the base Bioconductor packages with:
source("http://bioconductor.org/biocLite.R")
biocLite()
knitr
is the package that helps document comptational research.
To make recalculations faster we can engage a "caching" method by inserting the follwoing code within the .RMD
R Markdown document:
``` {r global_options_settings, include=TRUE, echo=FALSE}
# Global options:
opts_chunk$set(warning=FALSE, message=FALSE, comment="", cache=TRUE)
```
Data are microarray data from Affymetrix U133 GeneChips.
Source: http://www.oceanridgebio.com/images/system_rev_630.jpg
The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community.
Data type | Description |
---|---|
GEO Platform (GPL) | These files describe a particular type of microarray. They are annotation files. |
GEO Sample (GSM) | Files that contain all the data from the use of a single chip. For each gene there will be multiple scores including the main one, held in the VALUE column. |
GEO Series (GSE) | Lists of GSM files that together form a single experiment. |
GEO Dataset (GDS) | These are curated files that hold a summarized combination of a GSE file and its GSM files. They contain normalized expression levels for each gene from each sample (i.e. just the VALUE field from the GSM file). |
Today: we'll use a GSE entry that contains multiple samples (each would be a GSM)
Format name | Format |
---|---|
SOFT | Simple Omnibus Format in Text. |
MINiML | (MIAME Notation in Markup Language - XML format |
Matrix | spreadsheet containing the final, normalized values that are comparable across rows and Samples |
Paper All-Trans Retinoic Acid−Triggered Antimicrobial Activity against Mycobacterium tuberculosis Is Dependent on NPC2 Matthew Wheelwright, Elliot W. Kim, Megan S. Inkeles, Avelino De Leon, Matteo Pellegrini, Stephan R. Krutzik and Philip T. Liu J Immunol 2014; 192:2280-2290; Prepublished online 5 February 2014; doi: 10.4049/jimmunol.1301686 http://www.jimmunol.org/content/192/5/2280
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46268
gset <- getGEO("GSE46268", GSEMatrix =TRUE)
We'll explore the GEO2R script created on the web site and add a few more plots.
http://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE46268
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]
My challenge was: why do we write gset <- gset[[idx]]
?
This is why we start by a mini exercise with "lists" as this becomes gset <- gset[[1]]
if there is only one dataset.
# Create list L
L <- list(vn=c(2,3,5), vc=c("sun", "moons"))
# Print list L
L
class(L)
# Print first item and class of list L
L[1]
# First element of list: [1]
class(L[1])
# First element of first element [[1]]
class(L[[1]])
After class:
The evaluation is anonymous. Click or type the short URL: http://go.wisc.edu/61c0pc
Note: Survey will be unlocked when workshops are held.