
Statistics
in
real-time quantitative PCR
STATISTICS
AND GENE EXPRESSION
ANALYSIS
by Terry Seed

Why
do we measure gene
expression? The most common experiment is comparative: we want
to
compare the mRNA
levels of one or more genes in cells from different sources.
Comparisons
of interest include tumour vs normal cells, cells from a specific organ
in a mutant or genetically modified organism vs cells from the same
organ
in a normal organism of the same strain, and cells before and after an
intervention such as a drug treatment. Another important class is the
time-course
experiments, where cells are sampled at different times, e.g. after the
administration of a drug, or as the cell cycle or development proceeds,
and interest is
in temporal patterns of gene expression. Yet other experiments focus on
spatial patterns of gene expression. There are many other kinds of gene
expression experiments, essentially as many as there are organisms,
cell types and
conditions of biological interest.
How
do we measure gene
expression?
As stated above, there are many techniques for doing so, but most rely
on DNA-RNA or DNA-DNA hybridization. This is the process through which
single-stranded DNA or RNA molecules and and base-pair with their
complementary
sequences amidst a complex mixture of many molecules of the same kind.
The terminology we adopt names the sequence representing a gene of
interest
the probe, while the pool within which a complemen-tary copy of the
probe
is sought is named the target DNA or RNA. Other terminologies are the
reverse
of ours.
On
what scale do we measure
gene expression? Much of the recent interest by statisticians in
this area stems from the availability of data sets giving
expression measurements on tens of thousands of genes,so-called
microarray gene expression data. However, nylon membrane filters with
thousands of genes spotted on them have been around for over a decade,
and smaller-scale quantitative expression data for much longer. We
begin with a
discussion of the first and simplest method of quantifying
RNA, as many of the features of the high-throughput methods are already
present here.
Real-time
PCR Statistics
Joshua S. Yuan and C. Neal Stewart Jr.
PCR Encyclopedia (2005):
101127-49 http://www.pcr-encyclopedia.com/
Department of Plant Sciences and Genomics Hub, University of Tennessee,
Knoxville, TN 37996, USA
Real-time
quantitative RT-PCR: design, calculations, and statistics.
Rieu I, Powers SJ.
Plant Cell. 2009 21(4): 1023

Two recent letters to the editor of The Plant Cell
(Gutierrez et al., 2008; Udvardi
et al., 2008) highlighted the importance of following correct
experimental protocol in quantitative RT-PCR (qRT-PCR). In these
letters, the authors outlined measures to allow precise estimation of
gene expression by ensuring the quality of material, refining
laboratory practice, and using a normalization of relative quantities
of transcripts of genes of interest (GOI; also called target genes)
where multiple reference genes have been analyzed appropriately. In
this letter, we build on the issues raised by considering the
statistical design of qRT-PCR experiments, the calculation of
normalized gene expression, and the statistical analysis of the
subsequent data. This letter comprises advice for taking account of, in
particular, the first and the last of these three vital issues. We
concentrate on the situation of comparing transcript levels in
different sample types (treatments) using relative quantification, but
many of the concerns, particularly those with respect to design, are
equally applicable to absolute quantification.
Statistical
Selection
of Maintenance Genes for Normalization of Gene Expressions.
Yifan Huang Jason C. Hsu† Mario Peruggia‡ Abigail A. Scott
Statistical
Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006
Article 4

Maintenance genes can
be used for normalization in the comparison of gene expressions. Even
though the
absolute expression levels of maintenance genes may vary considerably
among different
tissues or cells, a set of maintenance genes may provide suitable
normalization if their expression levels are
relatively constant in the specific tissues or cells of interest. A
statistical procedure is proposed
to select maintenance genes for normalization of gene expression data
from tissues
or cells of interest. This procedure is based on simultaneous
confidence intervals for practical equivalence of
relative gene expressions in these tissues or cells. As an
illustration, the procedure is applied
to the maintenance gene expression data from Vandesompele et al. (2002).
The qPCR Data
Statistical Analysis - Integromics White Paper
Ramon Goni, Patricia
García and Sylvain Foissac
Integromics SL, Madrid Science Park, Santiago
Grisolía, 28760 Tres Cantos, Spain

Abstract: Data
analysis represents one of the biggest bottlenecks in qPCR experiments
and the statistical aspects of the analysis are sometimes considered
confusing for the non-expert. In this document we present some of the
usual methods used in qPCR data analysis and a practical example using
Integromics®' RealTime StatMiner®, the unique software analysis
package specialized for qPCR experiments which is compatible with all
Applied Biosystems Instruments. RealTime StatMiner® uses a simple,
step-by-step analysis workflow guide that includes parametric,
non-parametric and paired tests for relative quantification of gene
expression, as well as 2-way ANOVA for two-factor differential
expression analysis Link to
Integromics web page
Statistical
Significance of quantitative PCR.
Yann
Karlen , Alan McNair , Sebastien Perseguers , Christian Mazza &
Nicolas Mermod
BMC
Bioinformatics 2007, 8: 131

Background
PCR has the potential to detect and precisely quantify specific DNA
sequences, but it is not yet often used as a fully quantitative method.
A number of data collection and processing strategies have been
described for the implementation of quantitative PCR. However, they can
be experimentally cumbersome, their relative performances have not been
evaluated systematically, and they often remain poorly validated
statistically and/or experimentally. In this study, we evaluated the
performance of known methods, and compared them with newly developed
data processing strategies in terms of sensitivity, precision and
robustness.
Results
Our results indicate that simple methods that do not rely on the
estimation of the efficiency of the PCR amplification may provide
reproducible and sensitive data, but that they do not quantify DNA with
precision. Other evaluated methods based on sigmoidal or exponential
curve fitting were generally of both poor sensitivity and precision. A
statistical analysis of the parameters that influence efficiency
indicated that it depends mostly on the selected amplicon and to a
lesser extent on the particular biological sample analyzed. Thus, we
devised various strategies based on individual or averaged efficiency
values, which were used to assess the regulated expression of several
genes in response to a growth factor.
Conclusions
Overall, qPCR data analysis methods differ significantly in their
performance, and this analysis identifies methods that provide DNA
quantification estimates of high precision, robustness and reliability.
These methods allow reliable estimations of relative expression ratio
of two-fold or higher, and our analysis provides an estimation of the
number of biological samples that have to be analyzed to achieve a
given precision.
Statistical diagnostics emerging from external quality control of real-time PCR. Marubini E, Verderio P, Raggi CC, Pazzagli M, Orlando C; Italian Network for Quality Assessment of Tumor Biomakers; Italian Society of Clinical Chemistry and Clinical Molecular Biology. Institute of Medical Statistics and Biometry, Universita degli Studi di Milano, Milan, Italy.
Orginal Paper: Int J Biol Markers. 2004 Apr-Jun; 19(2): 141-146 Erratum: Int J Biol Markers. 2004 Jul-Sep; 19(3): 256
Besides the
application of conventional qualitative PCR as a valuable tool to
enrich or identify specific sequences of nucleic acids, a new
revolutionary technique for quantitative PCR
determination has been introduced recently. It is based on real-time
detection of
PCR products revealed as a homogeneous accumulating signal generated by
specific dyes. However, as far as we know, the influence of the
variability of
this technique on the reliability of the quantitative assay has not
been
thoroughly investigated. A national program of external quality
assurance (EQA)
for real-time PCR determination involving 42 Italian laboratories
has been developed to assess the analytical performance of real-time
PCR procedures.
Participants were asked to perform a conventional experiment based on
the use of an
external reference curve (standard curve) for real-time detection of
three cDNA
samples with different concentrations of a specific target. In this
paper the
main analytical features of the standard curve have been investigated
in an
attempt to produce statistical diagnostics emerging from external
quality
control. Specific control charts were drawn to help biochemists take
technical
decisions aimed at improving the performance of their laboratories.
Overall, our
results indicated a subset of seven laboratories whose performance
appeared to be markedly outside the limits for at least one of the
standard curve
features investigated. Our findings suggest the usefulness of the
approach
presented here for monitoring the heterogeneity of results produced by
different
laboratories and for selecting those laboratories that need technical
advice on
their performance.
Statistical
Inference for Quantitative Polymerase Chain Reaction Using a Hidden
Markov Model: A Bayesian Approach
Nadia
Lalam, Chalmers University of Technology, Sweden
Statistical
Applications in Genetics and Molecular Biology: Vol. 6 : Iss. 1,
Article 10.
Quantitative
Polymerase Chain Reaction (Q-PCR) aims at determining the initial
quantity of specific nucleic acids from the observation of the number
of amplified DNA molecules. The most widely used technology to monitor
the number of DNA molecules as they replicate is based on fluorescence
chemistry. Considering this measurement technique, the observation of
DNA amplification by PCR contains intrinsically two kinds of
variability. On the one hand, the number of replicated DNA molecules is
random, and on the other hand, the measurement of the fluorescence
emitted by the DNA molecules is collected with some random error.
Relying on a stochastic model of these two types of variability, we aim
at providing estimators of the parameters arising in the proposed
model, and, more specifically, of the initial amount of molecules. The
theory of branching processes is classically used to model the
evolution of the number of DNA molecules at each replication cycle. The
model is a binary splitting Galton-Watson branching process. Its
unknown parameters are the initial number of DNA molecules and the
reaction efficiency of PCR, which is defined as the probability of
replication of a DNA molecule. The number of DNA molecules is
indirectly observed through noisy fluorescence measurements resulting
in a so-called Hidden Markov Model. We aim at inference of the
parameters of the underlying branching process, and the parameters of
the noise from the fluorescence measurements in a Bayesian framework.
Using simulations and experimental data, we investigate the performance
of the Bayesian estimators obtained by Markov Chain Monte Carlo methods.
Common practice in
molecular biology may introduce statistical bias and misleading
biological interpretation.
Hocquette JF, Brandstetter AM.
J Nutr Biochem. 2002 Jun;13(6):370-377.

Unite de Recherches sur les Herbivores, Equipe
Croissance et Metabolismes du
Muscle, Theix, 63122, Saint-Genes-Champanelle, France
In studies on enzyme
activity or
gene expression at the protein level, data are usually analyzed by
using a
standard curve after subtracting blank values. In most cases and for
most techniques
(spectrophotometric assays, ELISA), this approach satisfies the basic
principles of linearity and specificity. In our experience, this might
be also the
case for Western-blot analysis. By contrast, mRNA data are usually
presented as
arbitrary units of the ratio of a target RNA over levels of a control
RNA
species. We here demonstrate by simple experiments and various examples
that this
data-normalization procedure may result in misleading conclusions.
Common
molecular biology techniques have never been carefully tested according
to the
basic principles of validation of quantitative techniques. We thus
prefer a
regression-based approach for quantifying mRNA levels relatively to a
control RNA
species by Northern-blot, semi-quantitative RT-PCR or similar
techniques. This
type of techniques is also characterized by a lower reproducibility for
repeated
assays when compared to biochemical analyses. Therefore, we also
recommend to
design experiments, which allow the detection of a similar range of
variance by
biochemical and molecular biology techniques. Otherwise, spurious
conclusions
may be provided regarding the control level of gene expression.
Confidence interval
estimation for DNA and mRNA concentration by real-time PCR: A new
environment for an old theorem.
Verderio P, Orlando C, Casini Raggi C, Marubini E.
Int J Biol Markers. 2004 Jan-Mar;19(1):76-9.

Operative Unit of Medical Statistics and Biometry,
Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy.
Bravais-Pearson and
Spearman correlation coefficients: meaning, test of hypothesis and
confidence interval.
Artusi R, Verderio P, Marubini E.
Int J Biol Markers. 2002 Apr-Jun;17(2):148-51.

Operative Unit of Medical Statistics and Biometry,
Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy.
Biostatistics and
tumor marker studies in breast cancer: design, analysis and
interpretation issues.
Biganzoli E, Boracchi P, Marubini E.
Int J Biol Markers. 2003 Jan-Mar;18(1):40-8.

Operative Unit of Medical Statistics and Biometry,
Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy.
SAS programs for
real-time RT-PCR having multiple independent samples.
Cook P, Fu C, Hickey M, Han ES, Miller KS.
Biotechniques. 2004 Dec;37(6): 990-995.

University of Tulsa, Tulsa, OK 74104, USA.
Relative real-time
reverse
transcription PCR (RT-PCR) has become an important tool for quantifying
changes in
messenger RNA (mRNA) populations following differential development or
stimulation of tissues or cells. However, the best methods for
conducting such
experiments and analyzing the resultant data remain an issue of
discussion. In this
report we describe an appropriate experimental methodology and the
computer
programs necessary to generate a meaningful statistical analysis of the
combined biological and experimental variability in such experiments.
Specifically,
logarithmic transformations of raw fluorescence data from the
log-linear portion
of real-time PCR growth curves for both target and reference genes are
analyzed
using a SAS/STAT Mixed Procedure program specifically designed to give
a
point estimate of the relative expression ratio of the target gene with
associated
95% confidence interval. The program code is open-source and is printed
in the
text.
Relative
Expression Software Tool (REST©) for
group wise comparison
and statistical analysis of
relative expression results in real-time PCR
Michael W.
Pfaffl Graham W. Horgan & Leo Dempfle
Nucleic Acids
Research 2002 May 1; 30(9): E36
=>
download
latest REST
versions <=
Real-time
reverse transcription followed by polymerase chain
reaction (RT-PCR) is the most suitable method
for the detection and quantification of
mRNA. It offers high sensitivity, good
reproducibility, and a wide quantification
range. Today relative expression is
increasingly used, where the expression of a
target gene is standardised by a non regulated
reference gene. Several mathematical
algorithm have been developed to compute an
expression ratio, based on real-time
PCR efficiency and the crossing point deviation
of an unknown sample versus a
control. But all published equations and
available models for the
calculation of relative expression ratio allow only for the
determination of a single
transcription difference between one control and one sample. Therefore
a new software tool
was established, named REST © (Relative Expression Software Tool),
which compares
two groups, with up to 16 data points in sample and 16 in control
group, for reference and up to four target genes. The mathematical
model
used is based on the PCR efficiencies and the mean crossing point
deviation between
sample and control group. Subsequently the expression ratio results of
the four
investigated transcripts are tested for significances by a
randomisation
test. Herein
development and application of REST is explained and the usefulness
of relative
expression
in real-time PCR using REST is discussed.
Kinetic
Outlier Detection (KOD) in
real-time PCR.
Tzachi Bar, Anders Stahlberg, Anders Muszta and Mikael Kubista
NAR Vol 31(17) e105

Department of
Chemistry and
Bioscience, Chalmers University of Technology, Medicinargatan 7B, 405
30 Gothenburg, Sweden,
Department of Mathematical Statistics, Eklandagatan 86, 412 96,
Gothenburg, Sweden
TATAA Biocenter, Medicinargatan 7B, 405 30 Gothenburg, Sweden
Real-time PCR is
becoming the
method of choice for precise quantification of minute amounts of
nucleic acids. For proper comparison of samples, almost all
quantification methods assume similar PCR effciencies in the
exponential phase of the reaction. However, inhibition of PCR is common
when working with biological samples and may invalidate the assumed
similarity of PCR effiencies. Here we present a statistical method,
Kinetic Outlier Detection (KOD), to detect samples with dissimilar
effiiencies. KOD is based on a comparison of PCR effciency, estimated
from the amplifiation curve of a test sample, with the mean PCR
effiency of samples in a training set. KOD is demonstrated and
validated on samples with the same initial
number of template molecules, where PCR is inhibited to various degrees
by elevated concentrations of dNTP; and in detection of cDNA samples
with an aberrant ratio of two genes. Translating the dissimilarity
in efficiency to quantity, KOD identifies outliers that differ by
1.3±1.9-fold in their quantity from normal samples with a
P-value
of 0.05. This precision is higher than the minimal 2-fold difference in
number of DNA molecules that real-time PCR usually aims to detect.
Thus, KOD may be a useful tool for outlier detection in real-time PCR.
"The book's title suggests that he
can make biostatistics intuitive for non-statisticians (e.g.
physicians, clinicians and nurses). After reading through it he has
made a believer out of me! He introduces concepts through examples and
touches on most of the important statistical methods that are used in
the medical literature. ... My usual concern with such books is that
concepts are oversimplified and the presentation is too cook-bookish.
Amazingly that is not the case here. Motulsky
carefully explains concepts such as confidence intervals, p-values,
multiple comparison issues, Bayesian thinking and Bayesian controversy
in a way that should be understandable to his intended audience." by
Michael R.
Chernick,
PhD (review posted on amazon.com)
We created
the GraphPad library to help biologists (and other scientists) learn
about data analysis. This "library"
contains articles and manuals written by GraphPad, as well
as links to web sites and books written by others. http://www.graphpad.com/index.cfm?cmd=library.index
Applied Robust
Statistics
David J. Olive, Southern Illinois University, Department
of Mathematics, Carbondale, IL 62901-4408
|