Data
Analysis and
BioInformatics in
real-time qPCR (3)
Overview: How to do successful gene expression analysis using real-time PCR Stefaan Derveaux, Jo Vandesompele, Jan Hellemans Methods Vol 50, Issue 4, April 2010, in The ongoing Evolution of qPCR edited by Michael W. Pfaffl, Pages 227-230 Reverse transcription
quantitative PCR (RT-qPCR) is considered today as
the gold standard for accurate, sensitive and fast measurement of gene
expression. Unfortunately, what many users fail to appreciate is that
numerous critical issues in the workflow need to be addressed before
biologically meaningful and trustworthy conclusions can be drawn. Here,
we review the entire workflow from the planning and preparation phase,
over the actual real-time PCR cycling experiments to data-analysis and
reporting steps. This process can be captured with the appropriate
acronym PCR: plan/prepare, cycle and report. The key message is that
quality assurance and quality control are essential throughout the
entire RT-qPCR workflow; from living cells, over extraction of nucleic
acids, storage, various enzymatic steps such as DNase treatment,
reverse transcription and PCR amplification, to data-analysis and
finally reporting.
![]() Quantitative real-time RT-PCR data analysis: current concepts and the novel "gene expression's CT difference" formula. Schefe JH, Lehmann KE, Buschmann IR, Unger T, Funke-Kaiser H. Center for Cardiovascular Research (CCR)/Institute of Pharmacology and Toxicology, Charité-Universitätsmedizin Berlin, Hessische Strasse 3-4, 10115, Berlin, Germany. J Mol Med. 2006 4(11):901-10. Epub 2006 Sep 14. For quantification of
gene-specific mRNA, quantitative real-time RT-PCR has become one of the
most frequently used methods over the last few years. This article
focuses on the issue of real-time PCR data analysis and its
mathematical background, offering a general concept for efficient, fast
and precise data analysis superior to the commonly used comparative CT
(DeltaDeltaCT) and the standard curve method, as it considers
individual amplification efficiencies for every PCR. This concept is
based on a novel formula for the calculation of relative gene
expression ratios, termed GED (Gene Expression's CT Difference)
formula. Prerequisites for this formula, such as real-time PCR
kinetics, the concept of PCR efficiency and its determination, are
discussed. Additionally, this article offers some technical
considerations and information on statistical analysis of real-time PCR
data.
Multiway real-time PCR gene expression profiling in yeast Saccharomyces cerevisiae reveals altered transcriptional response of ADH-genes to glucose stimuli. Ståhlberg A, Elbing K, Andrade-Garda JM, Sjögreen B, Forootan A, Kubista M. TATAA Biocenter, Odinsgatan 28, 411 03 Göteborg, Sweden. anders.stahlberg@neuro.gu.se BMC Genomics. 2008 9:170. BACKGROUND: The large sensitivity, high reproducibility and essentially unlimited dynamic range of real-time PCR to measure gene expression in complex samples provides the opportunity for powerful multivariate and multiway studies of biological phenomena. In multiway studies samples are characterized by their expression profiles to monitor changes over time, effect of treatment, drug dosage etc. Here we perform a multiway study of the temporal response of four yeast Saccharomyces cerevisiae strains with different glucose uptake rates upon altered metabolic conditions. RESULTS: We measured the expression of 18 genes as function of time after addition of glucose to four strains of yeast grown in ethanol. The data are analyzed by matrix-augmented PCA, which is a generalization of PCA for 3-way data, and the results are confirmed by hierarchical clustering and clustering by Kohonen self-organizing map. Our approach identifies gene groups that respond similarly to the change of nutrient, and genes that behave differently in mutant strains. Of particular interest is our finding that ADH4 and ADH6 show a behavior typical of glucose-induced genes, while ADH3 and ADH5 are repressed after glucose addition. CONCLUSION: Multiway real-time PCR gene expression profiling is a powerful technique which can be utilized to characterize functions of new genes by, for example, comparing their temporal response after perturbation in different genetic variants of the studied subject. The technique also identifies genes that show perturbed expression in specific strains. Statistical aspects of quantitative real-time PCR experiment design Robert R. Kitchen, Mikael Kubista, Ales Tichopad Methods Vol 50, Issue 4, April 2010, in The ongoing Evolution of qPCR edited by Michael W. Pfaffl, Pages 231-236 Experiments using
quantitative real-time PCR to test hypotheses are limited by technical
and biological variability; we seek to minimise sources of confounding
variability through optimum use of biological and technical replicates.
The quality of an experiment design is commonly assessed by calculating
its prospective power. Such calculations rely on knowledge of the
expected variances of the measurements of each group of samples and the
magnitude of the treatment effect; the estimation of which is often
uninformed and unreliable. Here we introduce a method that exploits a
small pilot study to estimate the biological and technical variances in
order to improve the design of a subsequent large experiment. We
measure the variance contributions at several 'levels' of the
experiment design and provide a means of using this information to
predict both the total variance and the prospective power of the assay.
A validation of the method is provided through a variance analysis of
representative genes in several bovine tissue-types. We also discuss
the effect of normalisation to a reference gene in terms of the
measured variance components of the gene of interest. Finally, we
describe a software implementation of these methods, powerNest, that
gives the user the opportunity to input data from a pilot study and
interactively modify the design of the assay. The software
automatically calculates expected variances, statistical power, and
optimal design of the larger experiment. powerNest enables the
researcher to minimise the total confounding variance and maximise
prospective power for a specified maximum cost for the large study.
The Prime Technique - Real-time PCR Data Analysis Mikael Kubista, Institute of Molecular Genetics and TATAA Biocenter, Sweden Radek Sindelka, Institute of Molecular Genetics, Czech Republic G.I.T. Laboratory Journal 9-10/2007, pp 33-35, GIT VERLAG GmbH & Co. KG, Darmstadt For measuring gene expression there is only one technique: PCR. But how can it be used with maximum efficiency? This article tries to give the answer to that question. ![]() Gene expression profiling – Clusters of possibilities Anders Bergkvist, Vendula Rusnakova, Radek Sindelka, Jose Manuel Andrade Garda, Björn Sjögreen, Daniel Lindh, Amin Forootan, Mikael Kubista Methods Vol 50, Issue 4, April 2010, in The ongoing Evolution of qPCR edited by Michael W. Pfaffl, Pages 323-335 Advances in qPCR
technology allow studies of increasingly large systems comprising many
genes and samples. The increasing data sizes allow expression profiling
both in the gene and the samples dimension while also putting higher
demands on sound statistical analysis and expertise to handle and
interpret its results. We distinguish between exploratory and
confirmatory statistical studies. In this paper we demonstrate several
techniques available for exploratory studies on a system of Xenopus
laevis development from egg to tadpole. Techniques include hierarchical
clustering, heatmap, principal component analysis and self-organizing
maps. We stress that even though exploratory studies are excellent for
generating hypotheses, results have not been proven statistically
significant until an independent confirmatory study has been performed.
An exploratory study may certainly be valuable in its own right, and
there are often not enough resources to report both an exploratory and
a confirmatory study at the same time. However, exploratory and
confirmatory studies are intimately connected and we would like to
raise that awareness among qPCR practitioners. We suggest that
scientific reports should always have a hypothesis focus. Reports are
either hypothesis generating, from an exploratory study, or hypothesis
validating, from a confirmatory study, or both. In either case, we
suggest the generated or validated hypotheses be specifically stated.
Download latest Genex version here => http://genex.gene-quantification.info/ Validation of differential gene expression algorithms: application comparing fold-change estimation to hypothesis testing. Yanofsky CM, Bickel DR. Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, Ottawa, Ontario, Canada. BMC Bioinformatics. 2010 Jan 28;11:63. BACKGROUND: Sustained research on the problem of determining which genes are differentially expressed on the basis of microarray data has yielded a plethora of statistical algorithms, each justified by theory, simulation, or ad hoc validation and yet differing in practical results from equally justified algorithms. Recently, a concordance method that measures agreement among gene lists have been introduced to assess various aspects of differential gene expression detection. This method has the advantage of basing its assessment solely on the results of real data analyses, but as it requires examining gene lists of given sizes, it may be unstable. RESULTS: Two methodologies for assessing predictive error are described: a cross-validation method and a posterior predictive method. As a nonparametric method of estimating prediction error from observed expression levels, cross validation provides an empirical approach to assessing algorithms for detecting differential gene expression that is fully justified for large numbers of biological replicates. Because it leverages the knowledge that only a small portion of genes are differentially expressed, the posterior predictive method is expected to provide more reliable estimates of algorithm performance, allaying concerns about limited biological replication. In practice, the posterior predictive method can assess when its approximations are valid and when they are inaccurate. Under conditions in which its approximations are valid, it corroborates the results of cross validation. Both comparison methodologies are applicable to both single-channel and dual-channel microarrays. For the data sets considered, estimating prediction error by cross validation demonstrates that empirical Bayes methods based on hierarchical models tend to outperform algorithms based on selecting genes by their fold changes or by non-hierarchical model-selection criteria. (The latter two approaches have comparable performance.) The posterior predictive assessment corroborates these findings. CONCLUSIONS: Algorithms for detecting differential gene expression may be compared by estimating each algorithm's error in predicting expression ratios, whether such ratios are defined across microarray channels or between two independent groups.According to two distinct estimators of prediction error, algorithms using hierarchical models outperform the other algorithms of the study. The fact that fold-change shrinkage performed as well as conventional model selection criteria calls for investigating algorithms that combine the strengths of significance testing and fold-change estimation. Automated validation of polymerase chain reaction amplicon melting curves. Mann TP, Humbert R, Stamatoyannopolous JA, Noble WS. Department of Genome Sciences, University of Washington, Seattle, WA, USA J Bioinform Comput Biol. 2006 4(2):299-315. The polymerase chain
reaction (PCR) is a fundamental tool of molecular biology. Quantitative
PCR is the gold-standard methodology for determination of DNA copy
numbers, quantitating transcription, and numerous other applications. A
major barrier to large-scale application of PCR for quantitative
genomic analyses is the current requirement for manual validation of
individual PCRs to ensure generation of a single product. This
typically requires visual inspection either of gel electrophoreses or
temperature dissociation ("melting") curves of individual PCRs--a
time-consuming and costly process. Here we describe a robust
computational solution to this fundamental problem. Using a training
set of 10 080 reactions comprising multiple quantitative PCRs from each
of 1728 unique human genomic amplicons, we developed a support vector
machine classifier capable of discriminating single-product PCRs with
better than 99% accuracy. This approach has broad utility, and
eliminates a major bottleneck to widespread application of PCR for
high-throughput genomic applications.
Statistical models in assessing fold change of gene expression in real-time RT-PCR experiments Fu WJ, Hu J, Spencer T, Carroll R, Wu G. Department of Epidemiology, Michigan State University, East Lansing, MI 48824, USA. Comput Biol Chem. 2006 30(1): 21-6. Real-time RT-PCR has
been frequently used in quantitative research in molecular biology and
bioinformatics. It provides remarkably useful technology to assess
expression of genes. Although mathematical models for gene
amplification process have been studied, statistical models and methods
for data analysis in real-time RT-PCR have received little attention.
In this paper, we briefly introduce current mathematical models, and
study statistical models for real-time RT-PCR data. We propose a
generalized estimation equations (GEE) model that properly reflects the
structure of repeated data in RT-PCR experiments for both
cross-sectional and longitudinal data. The GEE model takes the
correlation between observations within the same subjects into
consideration, and prevents from producing false positives or false
negatives. We further demonstrate with a set of actual real-time RT-PCR
data that different statistical models yield different estimations of
fold change and confidence interval. The SAS program for data analysis
using the GEE model is provided to facilitate easy computation for
non-statistical professionals.
The Importance of Quality Control During qPCR Data Analysis Barbara D’haene, Ph.D. & Jan Hellemans, Ph.D.Biogazelle & Ghent University Drug Discovery - August/September 2010 IntroductionSince its introduction in 1993, qPCR has paved its way towards one of the most popular techniques in modern molecular biology [1]. Despite its apparent simplicity, which makes qPCR such an attractive technology for many researchers, final results are often compromised due to unsound experimental design, a lack of quality control, improper data analysis, or a combination of these. To address the concerns that have been raised about the quality of published qPCR-based research, specialists in the qPCR field have introduced the MIQE guidelines for publication of qPCR-based results [2]. The main purpose of this initiative is to make qPCR-based research transparent, but the MIQE guidelines may also serve as a practical framework to obtain high-quality results. Within the guidelines, quality control at each step of the qPCR workflow, from experimental design to data analysis, is brought to the attention as a necessity to ensure trustworthy results. Numerous papers have been written about assay and sample quality control [3], but less attention has been spent on quality control on post-qPCR data. This article summarizes recommendations for this latter type of quality control including: detection of abnormal amplification, inspection of melting curves, control on PCR replicate variation, assessment of positive and negative control samples, determination of reference gene expression stability, and evaluation of deviating sample normalization factors. Error bars in experimental biology Geoff Cumming,1 Fiona Fidler,1 and David L. Vaux2 1School of Psychological Science and 2Department of Biochemistry, La Trobe University, Melbourne, Victoria, Australia 3086 Error bars commonly appear in fi gures in publications, but experimental biologists are often unsure how they should be used and interpreted. In this article we illustrate some basic features of error bars and explain how they can help communicate data and assist correct interpretation. Error bars may show confi dence intervals, standard errors, standard deviations, or other quantities. Different types of error bars give quite different information, and so fi gure legends must make clear what error bars represent. We suggest eight simple rules to assist with effective use and interpretation of error bars. Automatic Genomics: a user-friendly program for the automatic designing and plate loading of medium-throughput qPCR experiments Callejas S, Alvarez R, Dopazo A. Genomics Unit, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain. Biotechniques. 2011 50(1):46-50. Quantitative PCR (qPCR) remains the method of choice for gene and microRNA (miRNA) expression studies. Many laboratories wish to automate some or all of the steps of medium-throughput qPCR experiments through the use of various types of liquid handling robots. However, it is not uncommon to find cases in which scripts provided by the robot supplier are too rigid for user-specific applications, do not include all the desired options, or are too complicated to be modified by a nonprofessional programmer. Here, we present Automatic Genomics, a program that allows users with a limited programming background to automate medium-throughput qPCR experiments by using commercially available liquid-handling robots. The user is able to optimize the plate design in terms of number of genes, number of samples, and controls. Interactive analysis of systems biology molecular expression data Zhang M, Ouyang Q, Stephenson A, Kane MD, Salt DE, Prabhakar S, Burgner J, Buck C, Zhang X. Bindley Bioscience Center, Purdue University, West Lafayette, IN 47907, USA. BMC Syst Biol. 2008 2:23. BACKGROUND: Systems biology aims to understand biological systems on a comprehensive scale, such that the components that make up the whole are connected to one another and work through dependent interactions. Molecular correlations and comparative studies of molecular expression are crucial to establishing interdependent connections in systems biology. The existing software packages provide limited data mining capability. The user must first generate visualization data with a preferred data mining algorithm and then upload the resulting data into the visualization package for graphic visualization of molecular relations. RESULTS: Presented is a novel interactive visual data mining application, SysNet that provides an interactive environment for the analysis of high data volume molecular expression information of most any type from biological systems. It integrates interactive graphic visualization and statistical data mining into a single package. SysNet interactively presents intermolecular correlation information with circular and heatmap layouts. It is also applicable to comparative analysis of molecular expression data, such as time course data. CONCLUSION: The SysNet program has been utilized to analyze elemental profile changes in response to an increasing concentration of iron (Fe) in growth media (an ionomics dataset). This study case demonstrates that the SysNet software is an effective platform for interactive analysis of molecular expression information in systems biology. Roadmap for developing and validating therapeutically relevant genomic classifiers Simon R. National Cancer Institute, 9000 Rockville Pike, MSC 7434, Bethesda, MD 20892, USA J Clin Oncol. 2005 23(29):7332-41. Epub 2005 Sep 6. Oncologists need
improved tools for selecting treatments for individual patients. The
development of therapeutically relevant prognostic markers has
traditionally been slowed by poor study design, inconsistent findings,
and lack of proper validation studies. Microarray expression profiling
provides an exciting new technology for relating tumor gene expression
to patient outcome, but it also provides increased challenges for
translating initial research findings into robust diagnostics that
benefit patients and physicians in therapeutic decision making. This
article attempts to clarify some of the misconceptions about the
development and validation of multigene expression signature
classifiers and highlights the steps needed to move genomic signatures
into clinical application as therapeutically relevant and robust
diagnostics.
Cluster analysis and display of genome-wide expression patterns Eisen MB, Spellman PT, Brown PO, Botstein D. Department of Genetics, Stanford University School of Medicine, 300 Pasteur Avenue, Stanford, CA 94305, USA. Proc Natl Acad Sci U S A. 1998 95(25):14863-8. A system of cluster
analysis for genome-wide expression data from DNA microarray
hybridization is described that uses standard statistical algorithms to
arrange genes according to similarity in pattern of gene expression.
The output is displayed graphically, conveying the clustering and the
underlying expression data simultaneously in a form intuitive for
biologists. We have found in the budding yeast Saccharomyces cerevisiae
that clustering gene expression data groups together efficiently genes
of known similar function, and we find a similar tendency in human
data. Thus patterns seen in genome-wide expression experiments can be
interpreted as indications of the status of cellular processes. Also,
coexpression of genes of known function with poorly characterized or
novel genes may provide a simple means of gaining leads to the
functions of many genes for which information is not available
currently.
|
|
|