Abstract
Recent developments in the statistical analysis of genomewide studies are reviewed. Genomewide analyses are becoming increasingly common in areas such as scans for diseaseassociated markers and gene expression profiling. The data generated by these studies present new problems for statistical analysis, owing to the large number of hypothesis tests, comparatively small sample size and modest number of true gene effects. In this review, strategies are described for optimising the genotyping cost by discarding promising genes at an earlier stage, saving resources for the genes that show a trend of association. In addition, there is a review of new methods of analysis that combine evidence across genes to increase sensitivity to multiple true associations in the presence of many nonassociated genes. Some methods achieve this by including only the most significant results, whereas others model the overall distribution of results as a mixture of distributions from true and null effects. Because genes are correlated even when having no effect, permutation testing is often necessary to estimate the overall significance, but this can be very time consuming. Efficiency can be improved by fitting a parametric distribution to permutation replicates, which can be reused in subsequent analyses. Methods are also available to generate random draws from the permutation distribution. The review also includes discussion of new error measures that give a more reasonable interpretation of genomewide studies, together with improved sensitivity. The false discovery rate allows a controlled proportion of positive results to be false, while detecting more true positives; and the local false discovery rate and falsepositive report probability give clarity on whether or not a statistically significant test represents a real discovery.
Keywords:
genome scan; genetic association; multiple testing; permutation test; false discovery rateIntroduction
Recent technological advances allow the rapid generation of vast quantities of molecular biological data [1,2]. At the same time, the sequencing of the human genome and subsequent efforts to catalogue the variation within it[3] have created opportunities for testing thousands of sequence variations for association with disease, behavioural traits and physiological markers. Such applications are appealing because of the relative lack of success, to date, of positional cloning strategies that start with familybased linkage mapping [4], most likely due to insufficient sample sizes to detect genes of modest effect [5]. The wholegenome association scan is an increasingly feasible study design in which the genotyped markers are sufficiently closely spaced to detect linkage disequilibrium (LD) with all aetiological variants, and wellpowered sample sizes are more attainable [6]. Some initial studies have been performed in special populations[7,8] and in small samples of outbred populations;[9,10] genomewide admixture scans are imminent[11,12] and, ultimately, routine scans will be performed for common diseases in large cohorts of outbred populations [13].
Array experiments measuring large numbers of transcription or expression levels are another form of genomewide analysis that have become widespread [14]. Although the effect sizes expected in these studies are large by comparison with disease association studies, the sample sizes are constrained by cost to be relatively small, so that both types of study encounter problems of statistical power (Table 1). Expression levels can be regarded as quantitative traits under genetic control, so that both kinds of largescale exploration can occur in genome scans for loci influencing expression levels [15], or phenome scans demarking the influence of genetic pathways [16,17].
Table 1. Determinants of power in genomewide association and expression studies.
The analysis of large exploratory studies creates new problems for methodology and interpretation. Primarily, there is the multiple testing problem, whereby the chance of an exceptional result increases with the number of tests performed, even when there is no true association. To alleviate this problem, two broad strategies have emerged: first, to devise more sensitive tests, so that the penalty for multiple testing is less severe; and, secondly, to propose different measures of experimental error for which the interpretation of multiple testing is less serious. Furthermore, genomewide analysis creates problems of computational and cost efficiencies on account of the large volume of data to be generated and analysed.
Here, some recent work addressing these problems is reviewed. For the study design, work is summarised that minimises the cost of a study, while maintaining its power. For the analysis, methods are reviewed for improving sensitivity in the presence of multiple gene effects, by combining evidence across tests, and some methods for reducing the computational burden of permutation tests are discussed. The review concludes with a discussion of alternative error measures including false discovery rates.
This review is mainly concerned with a wholegenome association scan, using single nucleotide polymorphisms (SNPs), for a dichotomous disease status. It will be clear, however, that many of the methods apply in other situations, in particular to array expression studies. Although there are important differences between these two applications  including the number of expected true associations, sample size and effect size (Table 1)  their common exploratory character suggests that further advances may arise from crossapplication of ideas between these areas. For this reason, some methods developed for expression studies are reviewed; there is also a discussion on whether they may be suitable for genetic association scans. The objects of inference used will be 'genes'[18], with the understanding that, in this context, this can mean SNPs, whole genes, haplotype blocks, transcript levels or other features.
Study design
Large samples of unrelated individuals have become the design of choice for genomewide association scans, because earlier concerns about population stratification have been largely allayed by empirical methods [19]. Estimates of the total sample size are in the order of thousands [20]. Because the majority of genes are not associated with disease, it is uneconomical to genotype the whole sample for all genes. Sequential study designs, in particular a twostage block design, have been proposed for reducing the total cost of a genomewide experiment, which remains the main limiting factor preventing largescale application. In a twostage design, all of the genes are typed in a subset of the sample, with only the genes showing a trend of association being taken forward for genotyping in the remainder. This directs resources towards true associations at an earlier stage, so that the available sample size is larger for genes with true effects.
The design parameters for a twostage study include the total cost, total sample size, size of the first and second subsamples and rejection criterion at the end of the first stage. Studies with only two stages are considered, although more could be performed. Some of these parameters are constrained in advance, with the others then chosen to optimise some objective. One approach is to consider the genotyping cost as fixed and then find parameters that give the most power [21,22]. A general rule of thumb, considering a number of disease models and correlation structures between markers, is to allocate 75 per cent of resources to the first stage and then carry the most promising 10 per cent of markers to the second [22]. Here, the sample size is a function of the genotype unit cost and the number of markers, within the overall cost constraint.
It is more likely that the sample size is fixed (say, to provide sufficient power to detect a single association) and the goal is to minimise costs while achieving power close to that of the onestage design [23]. In many situations, the cost can be halved while keeping power within 1 per cent of the onestage design; thus, the total sample size can be calculated to achieve a certain power (say 81 per cent) in the onestage design and parameters then optimised for a twostage design. Considering a range of genetic models, a general guideline is to set the sample size of the first stage to have 97 per cent power for individual tests and carry forward all markers with nominal pvalues less than 0.15. The sample size for the first stage cannot be calculated without knowledge of the true effects, however, so a more practical approach is to consider the ranks of test statistics of the true effects [24]. Here, it is shown that similar information to the onestage design is obtained by genotyping all markers on 50 per cent of the sample and then genotyping the 10 per cent most promising on the remainder, resulting in a decrease of about 45 per cent in the number of genotypes. Again, the total sample size can be calculated for a onestage design; this last guideline is currently the most practical available and applies over a wide range of genetic models and correlation structures between markers.
An application of this strategy has been reported in which the primary constraint is the quantity of DNA available for study subjects [25]. About 44 per cent of the sample had sufficient DNA to be typed for all markers, with the remaining 56 per cent used for the second stage. An important feature of this study is that the test statistics are calculated over the full sample, with adjustment made for the interim test. This is in contrast to the simpler approach used in the simulation studies [21,23], in which test statistics were calculated separately for the two stages and their pvalues combined into an overall significance. Analysing both stages at once[25] makes more efficient use of information and will be the more powerful method for computing significance in the whole sample.
Formal sequential designs have also been proposed for genetic association studies [26]. These can result in substantial cost savings, on average, but have yet to become widely adopted, owing mainly to logistical difficulties. For example, the stopping criteria must be applied to each gene separately, but genotypes are often obtained in bulk in array format, which makes it difficult to apply sequential designs efficiently across many genes. The twostage designs are a compromise solution using frequentist inference, which also avoid the uncertainty in actual sample size that occurs with sequential inference. Future studies may introduce further design variables. For example, different genotyping technologies may be used in the two stages, with different unit costs, perhaps using DNA pooling [27]. Optimal study designs can be derived for these conditions following current principles.
Analysis methods for multiple associations
Many analysis methods are available for genetic data, but a first pass through a genomewide scan may normally consist of singlelocus tests for trend, perhaps additionally with twolocus interaction tests [28]. Several methods are now available that exploit the important feature that the majority of tested genes are not associated, but there are a small number of true, but weak, associations to be found. These methods are useful both for establishing statistical significance more strongly than singlelocus tests, and for informally suggesting sets of genes for followup study.
In the traditional hypothesistesting framework, each gene is tested individually and then a stepwise adjustment procedure is applied both to control the familywise type1 error rate (FWER) and to declare individual genes associated [29]. This approach, related to the Bonferroni correction, achieves strong control of the FWER, which is the probability of at least one false positive being within the desired rate when there are any number of true positives. This is generally considered to be too conservative for genomewide studies, however, because we can tolerate a small number of false positives if most true positives are detected. More preferable is weak control of FWER, which ensures that the probability of at least one false positive is within the desired rate only when there are no true positives. This is desirable, because we must defend against the possibility of there being no true associations in the sample, but it allows us to tolerate some false positives if some true positives are present.
A joint test of multiple genes can maintain weak control of FWER and should reveal greater evidence for association from a set of genes, although perhaps with less specificity for individual genes. This argument motivates the partial sum statistics [30], which are formed by obtaining test statistics (typically χ^{2 }tests from a contingency table) for each individual gene and then forming the sum of the K largest statistics, where K is a fixed number called the length. The significance of the sum can be assessed by a permutation test and an overall significance estimated over a range of lengths.
A more flexible alternative to the sum statistic is the truncated product of pvalues. Here, the product is formed of all the pvalues lower than a preset threshold [31], or the K smallest pvalues [32]. When the individual tests have the same distribution, the rank truncated product has equivalent power to the sum statistic, but is more balanced when the tests have different distributions. This will occur, for example, when conducting haplotypebased tests on regions of different sizes, leading to tests with different degrees of freedom. Analytic distributions are known for independent tests, which have been used in simulation studies to show improved power for combined evidence methods compared with traditional corrections [31,32]. The present authors prefer the truncated product to the sum statistic on account of its balanced combination of different test, and also prefer to truncate on rank rather than threshold because the number of true gene effects is fixed across studies, whereas their pvalues are random [32].
The length K should be close to the actual number of true associations, but this is generally unknown. A range of lengths could be tested, with the most significant length used to select genes for followup analysis; but there is no formal basis for this strategy, and simulation studies show that it is capable of grossly overor underestimating the number of true associations [33]. A judicious choice of a fixed length, say K < 20 for a genomewide association scan, is generally advisable provided that the tests are reasonably independent. When there is strong dependency between tests, such as in singlemarker analysis of a dense genomewide scan, then the variablelength approach can be used to establish statistical significance, but not to estimate the number of followup genes. Informally, genes would be followed up in rank order of significance; and if the prior power is high, this will tend to identify the true associations[32] In fact, formal adjustments based on the closure principle are available for individual tests, which allow strong control of FWER [34], but the primary use of truncated products is to show that the strongest associations indeed arise from true effects.
In working with the summary pvalue rather than the complete data, some information is lost, and a single analysis of the data may be more efficient. A natural approach is to estimate all gene effects together in regression model. On the genomewide scale, a fixedeffects regression is impractical, requiring estimation of many more parameters than there are observations. Therefore, several methods proposed for microarrays regard a gene as having a random effect, and model the distribution of gene effects by parametric forms that can be estimated. A simple model is to assume a normally distributed effect around zero [35], although this may lack power when most genes have no effect. The model can be extended by assuming that the effect variability comes from small and stronger effects, with inference based only on the stronger effects [36]. Another alternative is a mixture of a zerocentred normal and a point mass at zero[37] or, more generally, a mixture of three normals with respectively positive, zero and negative means [38]. Here, the zerocentred distribution is regarded as the null distribution, which allows for small nonzero effects to be regarded as uninteresting if there is sufficient evidence for stronger effects.
These approaches reduce the dimensionality of the inference while modelling the complete data, rather than summarising each gene before combining evidence. These methods offer promise for genomewide association scans, an important open question being the precision in estimating the random effects distribution when the number and size of true associations are small. For example, a method for testing whether the overall distribution of pvalues is uniform[39] has very little power compared with the Bonferroni correction when the number of true effects is small (authors' unpublished data). Another important issue is the choice of random effects distribution: current methods assume hierarchical or mixture normal distributions, but experimental geneticists have favoured gamma distributions [40,41]. A useful feature of the mixture distribution models is that they generate maximumlikelihood probabilities of membership to each of the mixture components, for each gene, which can be interpreted informally as posterior probabilities of association allowing individual genes to be selected for followup study.
Permutation testing
When the assumptions underlying analytical distributions are not met, permutation tests are a popular method for computing significance levels. In a genomewide association study, the problem is that genotypes are correlated due to LD; indeed, the correlations are necessary for the design to be successful. The standard procedure is to reassign trait values among study subjects, while keeping their genotypes fixed, thereby preserving the correlation structure across the multiple genes and realising the exchangeability conditions for a valid test [42]. When performing thousands of tests on thousands of subjects, however, a permutation procedure using thousands of replicates becomes extremely timeconsuming, with possible running times of days or weeks. Therefore, more efficient approaches to permutation testing have recently been proposed.
The accuracy of the permutation test can be improved by noting that the minimum pvalue, sum statistic and truncated product can all be regarded as the extreme value of a large number of observations [33]. Therefore, they should follow the extreme value distribution[43] and by fitting the parameters of the distribution to the values observed in permutation replicates, more accurate significance levels are obtained. Equivalently, fewer replicates are needed to reach a given accuracy. The efficiency gain depends upon a number of factors, including the true significance level and the number of tests, and it is difficult to compute standard errors for the empirical pvalues. Nevertheless, this approach has the advantage of being generally applicable and, importantly, the fitted distribution can be reused in subsequent tests of the same genes in the same population. This will be useful for studies based on a standard genomewide marker panel [3], leading to substantial time savings over the long term.
A complementary approach is to reduce the computation within each replicate. Lin[44] considered score statistics from regression models, showing that it is sufficient to multiply the score contributions of each subject by a normal random deviate to generate a realisation from the null distribution. Alternatively, Seaman and MüllerMysock[45] suggest sampling directly from the multivariate distribution for all the genes. The distribution can be estimated by considering the score test from a regression model that includes all the genes as predictors. This estimation may be difficult when the number of genes exceeds the number of subjects for which the procedure may need to be applied piecewise to subsets of genes. The approach of Lin also requires the sample size to exceed the number of genes, but preliminary results suggest that it would be more robust than that of Seaman and MüllerMysock when applied across the whole genome [44]. Both of these approaches require the analysis to be expressed as a score statistic from a regression model, which can be done in most situations but may require additional work by the user. Currently, Lin's method seems better suited to genomewide analysis, whereas that of Seaman and MüllerMysock is more applicable and efficient in smallerscale candidate gene studies [46].
A further approach is to assume that the sampled markers are representative of an 'effective number' of independent tests [4750]. After estimating this number  for example, from the singularvalued decomposition of the genotype correlation matrix[50]  asymptotic formulae can be applied. There is no formal basis for this approach, however, and studies based on real data indicate that the results are not always accurate;[51] indeed, there may be no such effective number after all [33]. This approach is not recommended; however, if it is used, all significant results should be confirmed by a permutation test.
False discovery rates
Another perspective on the multiple testing problem is that the familywise error rate is not the most appropriate measure, and that other measures should be used that have better sensitivity and specificity in genomewide studies. Although weak control of FWER for the overall significance has been advocated, some error control for the single tests is also desirable. Here, two prominent alternatives are discussed: false discovery rates (FDRs)[52,53] and posterior error rates [54,55].
The original FDR by Benjamini and Hochberg[52] is the expected proportion of false positives among all positive results, with the proportion defined as zero if there are no positives. That is, if R is the number of positive results in a study and V is the number of these that are false  that is, do not arise from true gene effects  then:
Subsequently, Storey and colleagues[53,56] have argued that the choice of the appropriate rate depends on how many positive results there are, and, furthermore, that the rate is only meaningful when there is at least one positive. This motivates the positive FDR (pFDR), defined as the expected proportion of false positives among all positive results, conditional on at least one positive at a given significance level:[56]
Rather than setting a fixed pFDR rate to control, Storey and colleagues suggest giving a value to each test that indicates what pFDR would result from declaring that test significant. The followup tests can then be chosen based on joint consideration of the number of tests selected and the pFDR associated with them. Formally, the qvalue associated with an individual test is defined as the minimum pFDR achieved when declaring all tests significant at the level of the test's pvalue. A qvalue can be estimated for each test in a genomewide experiment and followup tests selected from those with the lowest qvalues. This last stage is somewhat informal and may be driven by logistic and financial constraints.
A difficulty with FDR methods is that they control an expected proportion, whereas an investigator will be more concerned with the actual proportion of false positives within a study. Some insight is gained by considering the variation in withinstudy false discovery proportion or false discovery variance. Let i be an integer with p_{(i) }the ith smallest pvalue from a set of m tests. If the i most significant tests are declared positive, then mp_{(i) }estimates the maximum number of false positives. The associated variance is mp_{(i)}(1  p_{(i)}) (because the truth of a positive test is a binomial outcome) and the coefficient of variation is for the withinstudy false discovery proportion. This is greatest when p_{(i) }is small, so, for a fixed set of pvalues, this coefficient of variation is greatest when the fewest tests are declared significant. This will occur when a low error rate is set, or when there are few true associations, or when the power is low. In genomewide association scans, the number of true associations is expected to be small by comparison with the number of tests, so that the false discovery variance is relatively high in relation to the target rate, and the FDR approach may not be reliable for controlling the error rate within studies. In gene expression experiments, however, the number of true associations is somewhat higher and FDR methods are more appropriate for those studies.
Korn et al. study the withinstudy proportion of false discoveries and give procedures that keep the number (or proportion) of false discoveries within an upper bound with given probability [57]. The attraction of this approach is that one can limit the number of false positives with reasonable confidence, with the main disadvantage being increased computation. It is uncertain how the false discovery proportion behaves when it falls outside the upper bound and, although this approach is attractive, further operating characteristics may be needed before it becomes more widely used.
A further difficulty with FDR is that it says little about the individual tests. The most significant tests are most likely to be the true positives, but FDR and qvalues ignore this in favour of averaging the error rate across all significant tests. Efron and colleagues[58,59] propose the local FDR as the posterior probability that a null hypothesis is true, given an observed statistic. The local FDR is calculated as
where π_{0 }is the prior probability that the null hypothesis is true, T is a test statistic and f_{0 }and f_{1 }are the probability densities of T under the null and alternative hypotheses, respectively. π_{0 }and f_{1 }may be unknown but could be estimated from the data [58,60,61]. Note, however, that when the true value of π_{0 }is near one, as is likely in disease association scans, empirical estimates of π_{0 }may be greater than one, which leads to a downward bias if these estimates are truncated at one. Thus, it is better to fix a prior estimate of π_{0 }from genomic considerations such as the number of expected disease genes (O(10^{1})) and the number of genes in the genome (O(10^{4})) [62].
Both the local FDR and the qvalue are calculated for individual tests. The qvalue should be preferred if all positive tests will be followed up with roughly equal priority, which may be the case for a moderately powered study in which true and false positives are not well separated. The local FDR is preferable if decisions to follow up positive tests are taken on a case by case basis, because it is a property of single tests rather than the whole set of positive tests. This applies if there are a few very strong associations, together with some moderate ones, or if additional sources of evidence, such as biological plausibility, are taken into account, together with the statistical association.
A related quantity is the falsepositive report probability (FPRP) [55,63]. This is the posterior probability that a null hypothesis is true, given a statistic at least as extreme as that observed. It is calculated as
where now F_{0 }and F_{1 }are the cumulative distributions. For known π_{0 }and F_{1 }and large number of multiple tests, the FPRP is the same as the qvalue [56], the main difference being one of context. FPRP is intended to be applied across multiple studies and calculated from prior models, whereas qvalues are motivated by the withinstudy FDR and are usually estimated from data. FPRP is also mathematically complementary to the positive predictive value of a discriminant [64], again differing in context. Because FPRP is a property of a range of test statistics, it is appropriate for setting guidelines for the reporting of significant results, based on assumed models for π_{0 }and F_{1}. This means that results can continue to be reported according to their pvalues, but with modified thresholds of significance. A known proportion of reported results will then be false; however, for assessment of specific tests for followup, the local FDR is more relevant to investigators.
Posterior error rates such as local FDR and FPRP are gaining support because informed proposals can now be made for the prior probability of the null being true, based on genomic considerations [55,62]. Which of the various measures to use depends on the context. Some of the determining factors are summarised in Table 2.
Table 2. Comparison of different error rates and analysis methods.
Concluding remarks
Several aspects of the analysis of genomewide studies have been discussed, including study design, analysis method and error control, all of which bear on the likelihood of successfully identifying gene effects. There are some key aspects that have not been considered here, including selection and grouping of markers to be tested, population choice and data quality control. To some extent, these issues are specific to the type of study; this review has focused on the more general statistical issues that apply to most studies.
The field will continue to develop rapidly as more studies are completed and there is much scope for new methodology. In particular, combinations of the current methods may prove to be fruitful  for example, including combined evidence tests within a twostage design. There is no best method for all studies, because of their differing properties and aims, but this review has identified some of the questions that should guide the choice of analysis method. Another important area for development, which has not been discussed here, will be the incorporation of evidence from several sources, including association studies, gene ontology annotation, information from model organisms and structural bioinformatics, to give a holistic appraisal of the effects of genetic variation.
Acknowledgements
F.D. and A.G. are supported by the EU Bloodomics project (LSHMCT2004503485). B.P.C.K. is supported by the Dutch Diabetes Research Foundation, the Netherlands Organisation for Health Research and Development and the Juvenile Diabetes Research Foundation International (2001.10.004).
References

Syvanen AC: Toward genomewide SNP genotyping.
Nat Genet 2005, 37:S5S10. PubMed Abstract  Publisher Full Text

Lockhart DJ, Winzeler EA: Genomics, gene expression and DNA arrays.
Nature 2000, 405:827836. PubMed Abstract  Publisher Full Text

International HapMap Consortium: The International HapMap Project.
Nature 2003, 426:789796. PubMed Abstract  Publisher Full Text

Glazier AM, Nadeau JH, Aitman T: Finding genes that underlie complex traits.
Science 2002, 298:23452349. PubMed Abstract  Publisher Full Text

Boehnke M: Limits of resolution of genetic linkage studies: Implications for the positional cloning of human disease genes.
Am J Hum Genet 1994, 55:379390. PubMed Abstract  PubMed Central Full Text

Risch NJ: Searching for genetic determinants in the new millennium.
Nature 2000, 405:847856. PubMed Abstract  Publisher Full Text

Ophoff RA, Escamilla MA, Service SK, et al.: Genomewide linkage disequilibrium mapping of severe bipolar disorder in a population isolate.
Am J Hum Genet 2002, 71:565574. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ozaki K, Ohnishi Y, Iida A, et al.: Functional SNPs in the lymphotoxinalpha gene that are associated with susceptibility to myocardial infarction.
Nat Genet 2002, 32:650654. PubMed Abstract  Publisher Full Text

Klein RJ, Zeiss C, Chew EY, et al.: Complement factor H polymorphism in agerelated macular degeneration.
Science 2005, 308:385389. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hu N, Wang C, Hu Y, et al.: Genomewide association study in esophageal cancer using GeneChip mapping 10 K array.
Cancer Res 2005, 65:25422546. PubMed Abstract  Publisher Full Text

Halder I, Shriver MD: Measuring and using admixture to study the genetics of complex diseases.
Hum Genomics 2003, 1:5262. PubMed Abstract  Publisher Full Text

Patterson N, Hattangadi N, Lane B, et al.: Methods for highdensity admixture mapping of disease genes.
Am J Hum Genet 2004, 74:9791000. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Austin MA, Harding S, McElroy C: Genebanks: A comparison of eight proposed international genetic databases.
Community Genet 2003, 6:3745. PubMed Abstract  Publisher Full Text

Heller MJ: DNA microarray technology: Devices, systems, and applications.
Annu Rev Biomed Eng 2002, 4:129153. PubMed Abstract  Publisher Full Text

Schadt EE, Monks SA, Drake TA, et al.: Genetics of gene expression surveyed in maize, mouse and man.
Nature 2003, 422:297302. PubMed Abstract  Publisher Full Text

Freimer N, Sabatti C: The human phenome project.
Nat Genet 2003, 34:1521. PubMed Abstract  Publisher Full Text

Weiss KM, Terwilliger JD: How many diseases does it take to map a gene with SNPs?
Nat Genet 2000, 26:151157. PubMed Abstract  Publisher Full Text

Neale BM, Sham PC: The future of association studies: Genebased analysis and replication.
Am J Hum Genet 2004, 75:353362. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cardon LR, Palmer LJ: Population stratification and spurious allelic association.
Lancet 2003, 361:598604. PubMed Abstract  Publisher Full Text

Wang WY, Barratt BJ, Clayton DG, Todd JA: Genomewide association studies: theoretical and practical concerns.
Nat Rev Genet 2005, 6:109118. PubMed Abstract  Publisher Full Text

Saito A, Kamatani N: Strategies for genomewide association studies: Optimization of study designs by the stepwise focusing method.
J Hum Genet 2002, 47:360365. PubMed Abstract  Publisher Full Text

Sagatopan JM, Verbel DA, Venkatraman ES, et al.: Twostage designs for genetic association studies.
Biometrics 2002, 58:163170. PubMed Abstract  Publisher Full Text

Sagatopan JM, Elston RC: Optimal twostage genotyping in populationbased association studies.
Genet Epidemiol 2003, 25:149157. PubMed Abstract  Publisher Full Text

Sagatopan JM, Venkatraman ES, Begg CB: Twostage designs for genedisease association studies with sample size constraints.
Biometrics 2004, 60:589597. PubMed Abstract  Publisher Full Text

Lowe CE, Cooper JD, Chapman JM, et al.: Costeffective analysis of candidate genes using htSNPS: A staged approach.
Genes Immun 2004, 5:301305. PubMed Abstract  Publisher Full Text

Konig IR, Ziegler A: Group sequential study designs in geneticepidemiological casecontrol studies.
Hum Hered 2003, 56:6372. PubMed Abstract  Publisher Full Text

Sham P, Bader JS, Craig I, et al.: DNA pooling: A tool for largescale association studies.
Nat Rev Genet 2002, 3:862871. PubMed Abstract  Publisher Full Text

Marchini J, Donnelly P, Cardon LR: Genomewide strategies for detecting multiple loci that influence complex diseases.
Nat Genet 2005, 37:413417. PubMed Abstract  Publisher Full Text

Holm S: A simple sequentially rejective multiple test procedure.

Hoh J, Ott J: Mathematical multilocus approaches to localizing complex human trait genes.
Nat Rev Genet 2003, 4:701709. PubMed Abstract  Publisher Full Text

Zaykin DV, Zhivotovsky LA, Westfall PH, Weir BS: Truncated product method for combining Pvalues.
Genet Epidemiol 2002, 22:170185. PubMed Abstract  Publisher Full Text

Dudbridge F, Koeleman BP: Rank truncated product of Pvalues, with application to genomewide association scans.
Genet Epidemiol 2003, 25:360366. PubMed Abstract  Publisher Full Text

Dudbridge F, Koeleman BP: Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies.
Am J Hum Genet 2004, 75:424435. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Zaykin DV: Statistical Analysis of Genetic Associations. PhD thesis, North Carolina State University Raleigh, NC; 1999.

Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC: A global test for groups of genes: Testing association with a clinical outcome.
Bioinformatics 2004, 20:9399. PubMed Abstract  Publisher Full Text

Ishwaran H, Rao JS: Detecting differentially expressed genes in microarrays using Bayesian model selection.
J Am Stat Assoc 2003, 98:438455. Publisher Full Text

Kauermann G, Eilers P: Modeling microarray data using a threshold mixture model.
Biometrics 2004, 60:376387. PubMed Abstract  Publisher Full Text

Gusnanto A, Ploner A, Pawitan Y: Foldchange estimation of differentially expressed genes using mixture mixedmodel.

Allison DB, Gadbury GL, Heo M, et al.: A mixture model approach for the analysis of microarray gene expression data.

Hayes B, Goddard ME: The distribution of the effects of genes affecting quantitative traits in livestock.
Genet Sel Evol 2001, 33:209229. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Barton NH, Keightley PD: Understanding quantitative genetic variation.
Nat Rev Genet 2002, 3:1121. PubMed Abstract  Publisher Full Text

Pesarin F: Multivariate Permutation Tests With Applications in Biostatistics. Wiley, Chichester, UK; 2001.

Coles S: An Introduction to Statistical Modelling of Extreme Values. Springer, London, UK; 2001.

Lin DY: An efficient Monte Carlo approach to assesssing statistical significance in genomic studies.
Bioinformatics 2005, 21:781787. PubMed Abstract  Publisher Full Text

Seaman SR, MüllerMyhsok B: Rapid simulation of P values for product methods and multipletesting adjustment in association studies.

Seaman SR, MüllerMyhsok B: Reply to Lin.
Am J Hum Genet 2005, 77:514515. Publisher Full Text

Bailey TL, Grundy WN: Classifying proteins by family using the product of correlated pvalues.

Camp NJ, Farnham JM: Correcting for multiple analyses in genomewide linkage studies.
Ann Hum Genet 2001, 65:577582. PubMed Abstract  Publisher Full Text

Cheverud JM: A simple correction for multiple comparisons in interval mapping genome scans.
Heredity 2001, 87:5258. PubMed Abstract  Publisher Full Text

Nyholt DR: A simple correction for multiple testing for singlenucleotide polymorphisms in linkage disequilibrium with each other.
Am J Hum Genet 2004, 74:765769. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Salyakina D, Seaman SR, Browning BL, et al.: Evaluation of Nyholt's procedure for multiple testing correction.
Hum Hered 2005, 60:1925. PubMed Abstract  Publisher Full Text

Benjamini Y, Hochberg Y: Controlling the false discovery rate  A practical and powerful approach to multiple testing.

Storey JD, Tibshirani R: Statistical significance for genomewide studies.
Proc Natl Acad Sci USA 2003, 100:94409445. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Manly KF, Nettleton D, Hwang JT: Genomics, prior probability, and statistical tests of multiple hypotheses.
Genome Res 2004, 14:9971001. PubMed Abstract  Publisher Full Text

Wacholder S, Chanock S, GarciaClosas M, et al.: Assessing the probability that a positive report is false: an approach for molecular epidemiology studies.
J Nat Cancer Inst 2004, 96:434442. PubMed Abstract  Publisher Full Text

Storey JD: A direct approach to false discovery rates.
JR Stat Soc B 2002, 64:479498. Publisher Full Text

Korn EL, Troendle JF, McShane LM, Simon R: Controlling the number of false discoveries: Application to highdimensional genomic data.
J Stat Plan Inference 2004, 124:379398. Publisher Full Text

Efron B, Tibshirani R, Storey JD, Tusher V: Empirical Bayes analysis of a microarray experiment.
J Am Stat Assoc 2001, 96:11511160. Publisher Full Text

Efron B, Tibshirani R: Empirical Bayes methods and false discovery rates for microarrays.
Genet Epidemiol 2002, 23:7086. PubMed Abstract  Publisher Full Text

Pounds S, Morris SW: Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of Pvalues.
Bioinformatics 2003, 19:12361242. PubMed Abstract  Publisher Full Text

Liao JG, Lin Y, Selvanayagam ZE, Shih WJ: A mixture model for estimating the local false discovery rate in DNA microarray analysis.
Bioinformatics 2004, 20:26942701. PubMed Abstract  Publisher Full Text

Thomas DC, Clayton DG: Betting odds and genetic associations.
J Nat Cancer Inst 2004, 96:421423. PubMed Abstract  Publisher Full Text

Manly KF, Nettleton D, Hwang JT: Genomics, prior probability, and statistical tests of multiple hypotheses.
Genome Res 2004, 14:9971001. PubMed Abstract  Publisher Full Text

Ioannidis JP: Why most published research findings are false.