tRMA is a package for normalisation and preliminary analysis of cDNA microarray data. It uses some of the standard Bioconductor packages, and to that extent may itself be considered as a Bioconductor package.
tRMA does two things. On the one hand it provides a simple interface to some standard Bioconductor functionality, such as methods for graphical display of microarray data. The second thing tRMA does is to provide some unique functionality, including a spatial normalisation procedure.
The original version of tRMA was written by Dale Wilson in 2001/2 and had no connection with Bioconductor. The current version was written in 2002/3. Its functionality and interface (e.g. function names) were made to be as similar as possible to the original tRMA, but the current version makes use, as far as possible, of the facilities of Bioconductor.
The most important unique functionality in tRMA is the spatial normalisation function, SpatiallyNormalise described in §6 below. The remaining functionality is similar to what is available in other software packages such as Bioconductor. Indeed several tRMA functions, e.g. PlotData, operate by just calling functions in the standard Bioconductor packages.
First the tRMA package needs to be loaded: > library(tRMA)
Currently this takes some time and produces about a page of output, as tRMA depends on a number of underlying Bioconductor packages which themselves have to be loaded.
The primary source of data are .gpr files, as produced by the GenePix scanner and software. The following command loads such a file. > x <- LoadGenePixFile("mydata.gpr", 4, 4)
The first argument is the file name. The second and third arguments provide the block structure of the slide: the number of rows, then the number of columns.
There are a number of extra optional arguments to LoadGenePixFile which modify, for example, the columns of the data file that are to be used for the foreground and background spot data.
Once tRMA is loaded, the Bioconductor package marrayInput is available, and so all functions in that package may be used as well.
The function PlotData produces a standard "M versus A" plot of the spot values, with fitted loess curve: > data(swirl) > swirl1 <- swirl[, 1] > PlotData(swirl1)
This plot shows intensity-dependent bias,
a common artefact in cDNA microarray data.
The x-axis (A) is the product,
red ×green,
of the spot values,
while
The y-axis (M) is the ratio,
red / green.
Both axes are on a (base 2) log scale.
The function ImageData produces an image-style plot of the log-ratio values: > ImageData(swirl1)
Whereas
PlotData is a visual check for intensity-dependent bias,
ImageData is a visual check for location-dependent bias.
The function SlideNormalise estimates and subtracts intensity-dependent bias, such as may be seen via the PlotData function. This procedure produces a modified data object in which intensity-dependent bias is removed:
> swirl1.n <- SlideNormalise(swirl1) > PlotData(swirl1.n)
This function also performs a global scale normalisation,
setting the median absolute value to one.
This can substantially modify the scale of the log-ratios.
In this case the maximum absolute log-ratio has increased from
about 4 to about 16.
The function SpatiallyNormalise estimates and subtracts location-dependent bias, such as may be seen via the ImageData function. The method uses a 2D moving median filter - see [2]. By default a 3×3 window is used, but as the horizontal streak in this data is very narrow, we use a 9×1 window for this data: > swirl1.nn <- SpatiallyNormalise(swirl1, width = 9, height = 1) > ImageData(swirl1.nn)
It is important to note that this procedure is based purely on
foreground spot values.
Location-dependent biases in cDNA microarray data
have traditionally been adjusted for using
local estimates of background fluorescence.
This method, on the other hand,
adjusts using nearby foreground values only
- see [2].
For reasons of compatibility with the original tRMA, the function SpatiallyNormalise performs slide normalisation (calls SlideNormalise) before doing spatial normalisation. Therefore if both normalisations are required - i.e.both intensity- and location-dependent normalisations - then only SpatiallyNormalise need be called.
The function FindDiffExpGenes implements three methods for determining which genes are differentially expressed. All methods order genes by the absolute value of the log-ratio, but they have different approaches to determining the number of genes selected. The recommended method is the "BH" (Benjamini and Hochberg) method which is described in [1].
Select genes using the normalised data and the BH method only, suppressing file output:
> deg1 <- FindDiffExpGenes(swirl1.nn, do.Wilson = FALSE, do.save = FALSE)
FindDiffExpGenes: Doing BH method ... done Found 163 genes
The log-ratios are plotted against their position in the data file.
Only the first 10 genes are displayed,
but all the selected genes are written to the output object, deg.
When a number of slides are available, this process will in general produce a different list of genes for each slide. The function CompareInterestingGenes will examine the results from a number of calls to FindDiffExpGenes, and extract genes that are repeatedly identified as differentially expressed.
The swirl data set contains data from 4 slides. First normalise each of these and apply FindDiffExpGenes: > swirl2 <- swirl[, 2] > swirl3 <- swirl[, 3] > swirl4 <- swirl[, 4] > swirl.list <- list(swirl1, swirl2, swirl3, swirl4) > swirl.list.n <- lapply(swirl.list, SlideNormalise, do.scale = TRUE) > deg.list <- lapply(swirl.list.n, FindDiffExpGenes, do.save = FALSE, + do.Wilson = FALSE, do.plot = FALSE)
FindDiffExpGenes: Doing BH method ... done Found 128 genes FindDiffExpGenes: Doing BH method ... done Found 18 genes FindDiffExpGenes: Doing BH method ... done Found 13 genes FindDiffExpGenes: Doing BH method ... done Found 53 genes
Now we can call CompareInterestingGenes to find genes which appear in 2 of the 4 lists, and with consistent direction of differential expression: > deg.50 <- CompareInterestingGenes(deg.list, percent = 50) > deg.50
[1] "18-M3"
The function SummariseLabs summarises the normalised log-ratio data for these genes: > SummariseLabs(swirl.list.n, labs = deg.50, do.save = FALSE)
ID Name logratios StdDev BacktoRatio 1 18-M3 fb87a02 4.630723 3.260741 24.77346
The function SelectHouseKeepingGenes finds genes with expression levels which are both stable and low, for use in housekeeping based normalisation: > swirl.n <- SlideNormalise(swirl, do.scale = TRUE) > hk.genes <- SelectHouseKeepingGenes(swirl.n, do.save = FALSE) > hk.genes[, "gene.labels"]
[1] "18-E6" "18-A6" "25-K7" "12-O11" "26-O19" "19-I10" "12-A19" "10-K3" [9] "13-A15" "19-J22" "19-K10" "23-N2" "11-F24" "12-L2" "18-N11" "12-F11" [17] "19-O6" "19-O4" "24-N16" "13-C17" "13-F6" "20-A3" "7-O23" "6-J19" [25] "12-I12" "27-F19" "12-J24" "23-A16" "13-O9" "13-H7" "19-P9" "3-I8" [33] "24-J14" "9-C12" "27-B22" "27-G4" "9-O1" "13-J16" "24-F3" "21-L17" [41] "21-G22" "1-I7" "2-M23" "2-I19" "5-I24" "19-N19" "2-L17" "20-O20" [49] "6-P9" "13-A12" "6-D10" "10-G12" "9-B9" "9-L14" "7-M5" "1-H17" [57] "6-O7" "2-M3" "9-O12" "6-A4"
We can now annotate these as housekeeping genes, using AnnotateHK: > swirl1.hk <- AnnotateHK(swirl1, hk.genes)
Normalisation using these genes can now be carried out with SlideNormalise: > swirl1.hk.n <- SlideNormalise(swirl1.hk, housekeeping = TRUE)