HOME | Research | Media | Careers | Contacts | Products | Search | Publications | Site Map
CSIRO Mathematics, Informatics and Statistics

 

 

Image Analysis
Application Areas
 Biotechnology
 Cellular Screening
 Health
 Asset Monitoring
 Exploration
 Other Areas
Skills
 Segmentation
 Feature Extraction
 Statistical Analysis
 Stereo Vision
 Image Motion
 
Projects
Imaging Services
Imaging Products
Track Record
Publications
Patents
Staff

THE SPOT USER'S GUIDE


INTRODUCTION

Spot is a software package for the analysis of microarray images. At present (August 2000) Spot is supported on Linux, Solaris, DEC Alpha and Windows NT platforms.

Spot is implemented as a "package" for the free data analysis environment, "R". For information about R, see the "R home page". Spot is based on another R package called VOIR, which is currently being developed by the CSIRO Biotech Imaging Group, and provides a more general image analysis environment.

The first version of Spot was written by Jean Yang, working in Terry Speed's microarray data analysis group at the Department of Statistics, University of California, Berkeley, and the Walter & Eliza Hall Institute of Medical Research, Melbourne. Modifications and further development have been carried out by Michael Buckley and Ryan Lagerstrom. Other important technical contributions have been made by Kevin Cheong in porting to Windows NT, and by Richard Beare and Hugues Talbot as the authors, respectively, of VOIR and ImView.

INSTALLATION

See the appropriate installation guide for your system:

OUTLINE OF USAGE

After installation, there are five steps involved in the analysis of microarray images using Spot:

(1) Visual Assessment of Offset Variation

Estimating the amount of variation between images in global position, and choosing a template image. For details see this section of the manual.

(2) Specification of Image Names and Parameters

Constructing files to be read by Spot. For details see this section of the manual.

(3) Starting R and Loading Spot

This is trivial once things are set up. For details see this section of the manual.

(4) Specification of a Template From a Template Image

Point and click in an image display window. For details see this section of the manual.

(5) Analysis of an Image Pair

Calling the function "Spots". For details see this section of the manual.

Spot assumes that images will be analysed in batches as described in the next section. A batch of images is analysed by carrying out steps (1) to (4), then repeating step (5) for each image pair in the batch. To resume analysis of the same batch, steps (1), (2) and (4) are in general not required unless corrections need to be made to the information provided at steps (2) or (4). Generally only (3) is required before further repetitions of (5).

Details of each of these steps are given in later sections of this manual, but first we describe what a batch is in terms of microarray analysis using Spot.

BATCHES

The design and use of Spot relies on the concept of a batch of image data. For the purposes of Spot, a batch is a collection of microarray images whose overall geometric structure is the same. These will typically correspond to slides printed by the same printer and the same print head at around the same time, and scanned in a similar manner.

The geometry of microarray images can vary in a number of ways:

Basic Structure

This refers to the array dimensions in terms of counts of spots, and can be specified by the arrangement of grids (e.g. 4 x 4) and the arrangement of spots within grids (e.g. 19 x 21).

Clearly images within a batch must be identical in terms of their "basic structure". Basic structure information for a batch is specified, along with a shift tolerance (see below), in the Parameter File.

Pin Configuration

The heads of the pins on a microarray printer are in general not perfectly regular. That is, while they are nominally in a regular array, for example, 4 x 4, in fact slight bends or other effects mean that small irregularities are usually present. Even if these irregularities are very slight, they can result in significant irregularity in the grids in the microarray slide and hence in the image.

Spot assumes that slides in the same batch have pin configurations which are very nearly the same. Pin configuration information for a batch, along with information about the distance between spots, is stored in the Template File.

Overall Shifts

Various factors, image cropping in particular, can lead to an overall shift in all spot positions from image to image.

Spot does expect such variation within batches; in fact a key component of the grid location process is an estimate of the overall shift between the current image and the template.

Rotation and Skewing

At present Spot does not expect other distortions such as rotation or skewing. Significant amounts of such distortion will therefore lead to incorrect results.

VISUAL ASSESSMENT OF OFFSET VARIATION

As described in the previous section, Spot assumes that, between images in the same batch, the primary structural variation is one of translation. Before doing any analysis, it is necessary to have some idea of the nature of this variation. This is necessary for two reasons:

  • to enable one image from the batch to be chosen as "typical" or "intermediate" and to be used as the template image;
  • to enable the user to estimate the maximum amount of offset variation between the template image and all the other images. This is used by Spot as tolerance information.

For example, inspection of the images may reveal that the range of horizontal offsets is approximately 30 pixels and the range of vertical offsets is approximately 20 pixels. If the third image in the batch is intermediate in both dimensions, then this would be a good choice for the template image. In such a case it is recommended that the horizontal tolerance (i.e. tolerance.c, the variation in column position) be set to 20 pixels, this being a little larger than 30/2 = 15 pixels. Similarly, a suitable value for vertical tolerance (tolerance.r) would be 15 pixels in this case. Small tolerance values may lead to incorrect grid location while larger values slightly decrease the speed of the grid location process.

In another batch, horizontal and vertical variation may both be small, say 10 pixels or less. In such a case both tolerances may be set to 10 pixels and any image - the first, say - chosen as the template.

A technical note regarding offsets: If all images in a batch have the same dimensions - for example, 1500 x 1500 pixels - it is possible in principle to overlay any pair of images and thereby assess the difference in position of the grids of spots in the two images. However if images are not all of the same dimensions, this no longer applies. In such a case it is important to know that in Spot all pixel positions are calculated relative to the top-left corner of the image. Therefore when assessing offset variation in images of unequal sizes the user should consider spot positions in an image relative to the top-left corner of the image.

SPECIFYING IMAGE NAMES AND PARAMETERS

The Image Name File

To begin with, each batch of microarray data needs a name. We will use the name "array1".

To specify the image pairs comprising the batch "array1", use an editor or other means to create a file named

	images.<batch>,

in our case "images.array1", which looks like

	R		G
	array1.R1.tif	array1.G1.tif
	array1.R2.tif	array1.G2.tif
	array1.R3.tif	array1.G3.tif
	etc.

Each line gives the names of a pair of images. These image files, as well as the file "images.array1" itself, need to be in the directory where Spot is run. Note that under Windows it is possible to change the working directory while running Spot using the "Change dir" entry in the "File" menu. Currently only TIF image format is supported.

The first row in the image name file is a column label. Besides "R"/"G" the following label pairs may be used: "Red"/"Green", and "Cy5"/"Cy3". Lower case versions may also be used and the order of columns may be reversed. The second and subsequent rows of the file contain names of the image files, one pair per line.

The columns of the image name file should be separated by "white space" - any number of spaces and TAB's.

Spot also supports TIF images which contain both the Red and Green channels, as produced for example by GenePix. In this case the image name file should contain one column only, with no column labels. An example image name file for this type of image follows.

	array1.combined1.tif
	array1.combined2.tif
	array1.combined3.tif
	etc.

The image name file can also be created interactively within Spot using the SetImages command.

The Parameter File

This is a file named

	parameters.<batch>

which contains basic structure and shift tolerance information. A sample is as follows.

  list(nspot.r = 19,     # Number of rows of spots per grid
       nspot.c = 21,     # Number of columns of spots per grid
       ngrid.r = 4,      # Number of rows of grids per image
       ngrid.c = 4,      # Number of columns of grids per image
       tolerance.r = 40, # Top/bottom translation tolerance
       tolerance.c = 20  # Left/right translation tolerance
      )

This format is R's format for ASCII representation of R objects. Spaces and TABS are unnecessary, but improve readability. The "#" character and any text following it on a line are treated as a comment and ignored.

The meanings of the entries in this file are given in the comments in the sample file. For explanation of the meaning and uses of the tolerance values, see Visual Assessment of Offset Variation.

The parameter file can also be created interactively within Spot using the SetParameters command.

STARTING `R' AND LOADING `SPOT'

The method for starting R is platform dependent. On UNIX and Linux systems, it will usually just mean typing "R" to a shell window, possibly after modification of the command search path. Under Windows, selection of either a menu item or an icon will start R. When using Windows the user can start R with or without a GUI; the GUI is not necessary for running Spot.

To load Spot use the R function library:

library(Spot)

This will succeed if Spot has been installed in the public R library on your computer. Otherwise it may be necessary to modify R's library search path first. See under library in the R function documentation.

Note that R executes commands in a file named ".Rprofile" if it exists in the working directory. Therefore if such a file exists and contains the line

library(Spot)

then it is sufficient simply to start R.

SPECIFYING A TEMPLATE

A template image is chosen as described in Visual Assessment of Offset Variation. Suppose the third image in the batch (as ordered in the Image Name File) has been selected. Specification of a template from this image involves manually identifying a defined set of features in this image. This is done by calling the function SetTemplate. This function requires two arguments: the batch name and the index of the template image. In this case we enter the following command to R:

SetTemplate("array1", 3).

The red and green images representing the Template Image are read and combined into a single image which is displayed on the screen in an ImView window. The parameter file is also read, so the software knows that there are in this case 4 x 4 = 16 grids in the image. The user is asked - by a message appearing in the R command window - to select 16 + 1 = 17 feature points in the ImView window. The following steps are then carried out:

Add-Point Mode

First go to the ImView display window. Then select the "Transform" menu, the "Pointfile" sub-menu and finally the "Add Point Mode" menu item. This can also be done by typing CTRL-G in the ImView window.

Top-Left Spots in Each Grid

Now move the cursor to the centre of the top-left spot in the top-left grid. If no such spot is visible - this may be an empty position - estimate as closely as possible where the centre of such a spot would be. An error of 2 or 3 pixels will not adversely affect the performance of the algorithm, but be as careful as possible. Select this point by clicking the left mouse button. A circular mark will appear at the selected location.

Repeat for the top-left spot in the next grid to the right, then the next and the next to the end of the top row. Repeat for the second row moving from left to right, then the third and so on until top-left spots have been selected for all grids.

This process essentially captures pin configuration information.

Bottom-Right Spot, Bottom-Right Grid

Finally, select one more point: the centre of the bottom-right spot in the bottom-right grid. This provides information on the size of each grid and hence on the distance between successive rows and columns of spots.

When all points have been selected, return to the R command window and press ENTER. The software will check that the correct number of feature points have been selected; if not, you will be asked to repeat the process.

It is always possible to re-do the template definition via SetTemplate at any later time, for example if the current template is wrong in some way. The old template will be overwritten.

ANALYSING AN IMAGE PAIR

Once image names, parameters and a template have been set up for a batch of data, any pair of images in the batch may be analysed by calling the function Spots. For example, the command

spotdata.array1.1 <- Spots("array1", 1)

analyses the first image pair from the batch "array1" and assigns the results to the R object "spotdata.array1.1". The first argument to Spots is the batch name and the second is the index of the image pair to be analysed.

Note that R provides a range of facilities, for example, command line editing, looping and lists, which means that repeated similar commands such as this don't need to be laboriously re-typed.

The function Spots proceeds in several stages:

Formation of a Combined Image

The two images, red and green, are read and then combined into a single image. This image is "flattened", a process which reduces the grey-level difference between very bright and very dark spots.

Grid Location

The template is used here to locate the rows and columns of each of the grids. When this is complete, the combined image is displayed with the fitted grids overlayed. The user can assess the accuracy of the grid estimation from this display, but currently there is no way to intervene. The software continues processing after displaying the overlay image.

Segmentation

Using the combined image and the fitted grids, Spot estimates the extent and shape of each spot. Currently this is done using a seeded region growing technique (Adams and Bischof, 1994). The segmentation results are displayed, but again there is no way currently for the user to intervene in the process.

Statistics

In this final stage, the spot segmentation is used together with the raw image data to compute a number of statistics. These fall into 5 groups

  • location information,
  • spot values,
  • background values,
  • spot shape parameters and
  • other statistics formed by combining these values.
A description of each of the columns of the ouput may be found: 

Spot: Description of Output.

OTHER FUNCTIONS

Spot provides a selection of other tools besides those described above.

SetImages Specify or modify the file names for images in a batch
SetParameters Define the basic geometry of a batch of microarray images
SetTemplate Define a template for a batch of images
Shear Correct for shear in an image
ShowSpot Create an image showing the input image data for a given microarray spot
SpotRGB Create an RGB image for visualizing microarray images
Spots Analyse microarray image data

REFERENCES

Adams, R. and Bischof, L. "Seeded region growing". IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 641-647.

© Copyright 2010, CSIRO Australia
Use of this web site and information available from
it is subject to our
Legal Notice and Disclaimer and Privacy Statement