A general test of association for quantitative traits in nuclear families.

Abecasis GR; Cardon LR; Cookson WO

doi:10.1086/302698

A general test of association for quantitative traits in nuclear families.

Abecasis GR ¹,

Cardon LR ,

Cookson WO

Affiliations

1. The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, United Kingdom.
Authors
Abecasis GR¹
(1 author)

ORCIDs linked to this article

Abecasis GR | 0000-0003-1509-1825

American Journal of Human Genetics, 01 Jan 2000, 66(1):279-292
https://doi.org/10.1086/302698 PMID: 10631157 PMCID: PMC1288332

Free full text in Europe PMC

Abstract

High-resolution mapping is an important step in the identification of complex disease genes. In outbred populations, linkage disequilibrium is expected to operate over short distances and could provide a powerful fine-mapping tool. Here we build on recently developed methods for linkage-disequilibrium mapping of quantitative traits to construct a general approach that can accommodate nuclear families of any size, with or without parental information. Variance components are used to construct a test that utilizes information from all available offspring but that is not biased in the presence of linkage or familiality. A permutation test is described for situations in which maximum-likelihood estimates of the variance components are biased. Simulation studies are used to investigate power and error rates of this approach and to highlight situations in which violations of multivariate normality assumptions warrant the permutation test. The relationship between power and the level of linkage disequilibrium for this test suggests that the method is well suited to the analysis of dense maps. The relationship between power and family structure is investigated, and these results are applicable to study design in complex disease, especially for late-onset conditions for which parents are usually not available. When parental genotypes are available, power does not depend greatly on the number of offspring in each family. Power decreases when parental genotypes are not available, but the loss in power is negligible when four or more offspring per family are genotyped. Finally, it is shown that, when siblings are available, the total number of genotypes required in order to achieve comparable power is smaller if parents are not genotyped.

Free full text

Am J Hum Genet. 2000 Jan; 66(1): 279–292.

Published online 1999 Dec 22. https://doi.org/10.1086/302698

PMCID: PMC1288332

PMID: 10631157

A General Test of Association for Quantitative Traits in Nuclear Families

G. R. Abecasis, L. R. Cardon, and W. O. C. Cookson

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Summary

Go to:

Introduction

Increasingly large numbers of single-nucleotide polymorphisms are available in public and private databases (Collins et al. 1997). The emergence of high-throughput methods for their analysis holds promise for saturation mapping of human complex-disease loci (Risch and Merikangas 1996; Chakravarti 1998; Lander 1999). Whereas allele-sharing methods of linkage analysis can localize disease genes to broad chromosomal regions, in complex diseases their resolution is often poor. Accordingly, the effort spent in generating an increasingly fine map provides rapidly diminishing returns when these conventional methods of linkage analysis are used (Kruglyak 1997).

In outbred samples, allelic association due to linkage disequilibrium is expected to operate over very short distances. Appropriately designed tests of association that use family-based controls to account for population substructure can provide direct tests of linkage disequilibrium and efficient fine-mapping tools. These tests should have much greater power when a fine map is available, and their high resolution should be well suited to identification of candidate genes.

The most popular of these family-based tests of association is the transmission/disequilibrium test (TDT), which was introduced by Spielman et al. (1993) as a test of linkage in the presence of allelic association. When either a single affected child is tested in each family or, with appropriate adjustments (Martin et al. 1997), multiple children are tested, it is often used as a test of linkage disequilibrium (i.e., a test of the joint hypothesis of linkage and association). The TDT was designed for the analysis of dichotomous traits, and a number of refinements have been proposed to allow, for example, the use of siblings as controls (Curtis 1997) and increased power in the presence of imprinting or dominance (Weinberg et al. 1998).

It has been shown that tests of transmission disequilibrium require a larger number of families in order to achieve comparable power when siblings, rather than parents, are used to construct controls (Curtis 1997). However, it is not always practical to collect parents, and attempting to deduce parental genotypes is fraught with pitfalls (Curtis and Sham 1995). Ideally, TDT-like tests should use parental genotypes when available and sibling genotypes otherwise, to consider all available information in the most efficient manner possible.

For many complex diseases, quantitative phenotype scores contain more information than is provided by dichotomous traits. Quantitative traits can provide effective descriptions of conditions as varied as asthma, type II diabetes, learning difficulties, and osteoporosis. The use of quantitative traits is well established in linkage studies, and these traits should be equally effective in family-based tests of association. Allison (1997) and Rabinowitz (1997) introduced family-based linkage tests, for quantitative traits; like the TDT, these tests use parental genotypes to construct well-matched controls and that are tests of linkage disequilibrium in simplex families. Fulker et al. (1999) described an analogous test for sib-pair data that does not use parental genotypes.

Here we present a general linkage-disequilibrium test that is applicable to the analysis of quantitative traits in nuclear families of any size and that optionally uses parental genotypes. The method builds on the recent approach of Fulker et al. (1999), in that association effects are partitioned into between- and within-family components. The model also makes use of the powerful and flexible variance-components framework, to construct tests of linkage, linkage disequilibrium, and population admixture that use information from all available offspring. In addition to extending this model to accommodate nuclear families of any size, we derive the expectations of the model parameters and show that a test of the within-family component is indeed free of confounding population-substructure effects, regardless of the composition of nuclear families. We also show that admixture impacts the between-family–component estimate when samples from a number of population strata are combined. This general model encompasses the specific test and study design of Fulker et al. (1999), as well as that of Rabinowitz (1997). The properties of the model in terms of power and error rates are explored in a number of situations, including moderate sample sizes and violation of the multivariate normality assumption underlying variance-components methods. Power and optimal study designs in terms of parental information and family size are also examined.

Go to:

Maximum-Likelihood Tests of Association

Consider a candidate diallelic marker, M, with alleles arbitrarily designated as “1” (with frequency p) and “2” (with frequency q=1-p) and an additive genetic value a. Note that our usage of the additive genetic value refers to the observed marker, not the trait locus, and that a≠0 only when the marker locus is either the trait locus or in disequilibrium with it. Also, as long as the phase of the association is the same in all subpopulations, a=0 only when there is no linkage disequilibrium. Given a set of i=1…K nuclear families, each with n_i children so that the total number of offspring is N=Σ_in_i, define the marker phenotype m_ij and the genotype score g_ij for the jth offspring (j=1…n_i) in the ith family as m_ij=number of “1” alleles at locus M, and g_ij=m_ij-1. If both parental genotypes are known, label their analogous genotype scores g_iM and g_iF for the male and female parent, respectively.

Following the usual biometric model (Falconer 1989), we assume that the phenotype scores for the trait of interest are defined by a major-gene effect, familial effects (which include the effects of shared environment and half the additive polygenic variance), and a residual environmental component. The expected mean of the residual resemblance and unique environmental effects are assumed to be 0, so that

and, for the offspring in each family, the n_i×n_i variance-covariance matrix, Ω_i, has elements

where π_ijk denotes the proportion of alleles shared identical by descent (IBD) between siblings j and k in family i, σ²_a is the additive genetic variance of the major gene, σ²_s is the residual sibling resemblance, and σ²_e is the residual environmental variance component. Note that these expectations do not include any dominance variance, although the general method can easily accommodate such effects (Fulker et al. 1999; also discussed below).

Variance-components approaches allow simultaneous modeling of the means and variances, so that all the information in a set of related individuals can be used to construct a test of association. For a particular means model, such as

and for estimates of all of the variances in Ω_i, the likelihood of the data for the complete set of parameters, θ=[μ,β_a,σ²_a,σ²_s,σ²_e], is

Evidence for association can be evaluated by maximization of L with the constraint β_a=0 (null-hypothesis likelihood, L₀) and without constraints on the parameters (alternative-hypothesis likelihood, L₁). Asymptotically, the quantity 2[ln(L₁)-ln(L₀)] is distributed as χ², with df equal to the difference in number of parameters estimated. Similar likelihood-ratio tests have been proposed as tests of association between marker and phenotype (e.g., see George and Elston 1987), and, in the absence of population admixture, they are valid tests of linkage disequilibrium, because E(β_a)=a. For families with exactly two siblings, this model is the same as the general model of Fulker et al. (1999).

Go to:

Population Admixture

We allow for the most extreme form of population admixture, in which each family is drawn from a different stratum. Define μ_i, p_i, and q_i, the phenotypic mean and allele frequencies for the stratum from which family i was drawn. Assume that within each subpopulation there is random mating and random transmission of parental alleles to offspring and that the total sample of N individuals is centered on mean 0; that is, μ=Σ_in_iμ_i=0.

In this situation, the expectation given in equation (1) can be expressed as E(y_ij)=E(μ_i+g_ija)=μ_i+(p_i-q_i)a, and the alternative hypothesis of no linkage disequilibrium is no longer encompassed in a test of β_a≠0. As shown in Appendix A, in this case, E(β_a)=Σ_in_i(p_i-q_i)μ_i/(NV_g)+a, where V_g is the variance of the genotypic scores. The numerator of the first term in this expression represents “spurious” association, at the population level, that is independent of linkage.

Go to:

Orthogonal Decomposition of the Genotype Scores

Fulker et al. (1999) proposed that, for sib-pair data, the genotype score could be decomposed into orthogonal between-family (b) and within-family (w) components, in which only the former is sensitive to population structure and in which the latter is significant only in the presence of linkage disequilibrium. The means model under this specification is

where b_i and w_ij are orthogonal between- and within-family components of g_ij. We extend this model to accommodate any number of offspring, with or without parental genotypes, as

so that b_i is the expectation of each g_ij conditional on family data and w_ij is the deviation from this expectation for offspring j. Note that, when b_i=(Σ_jg_ij)/n_i, all possible siblings are considered and parental data are ignored, whereas, when equation M1 , only parental data are used (for an example of how b_i and w_ij are scored, with and without parental data, see table 1). Positive values of w_ij indicate that a child inherits more copies of allele “1” than would be expected, whereas negative values refer to inheritance of an excess of allele “2.” For sib-pair data for which parental genotypes are not available, which is the situation described by Fulker et al. (1999), w_i1=-w_i2 for any i, so that different alleles are tested in each member of a sib pair and the distribution of β_w is unaffected by linkage in the absence of association.

Table 1

Example Scoring of b_i and w_ij in a Sib Pair^[Note]

Offspring			Parental Information			Sibs only
g_i1	g_i2	P(g)	b_i	w_i1	w_i2	b_i	w_i1	w_i2
One Heterozygous Parent (g_iF, g_iM) = (1, 0)
0	0	p³q	1/2	− 1/2	− 1/2	0	0	0
0	1	2p³q	1/2	− 1/2	1/2	1/2	− 1/2	1/2
1	1	p³q	1/2	1/2	1/2	1	0	0
Two Parents Heterozygous (g_iF, g_iM) = (0, 0)
−1	−1	1/4p²q²	0	−1	−1	−1	0	0
1	1	1/4p²q²	0	1	1	1	0	0
0	0	p²q²	0	0	0	0	0	0
−1	0	p²q²	0	−1	0	− 1/2	− 1/2	1/2
0	1	p²q²	0	0	1	1/2	− 1/2	1/2
−1	1	1/2p²q²	0	−1	1	0	−1	1

Note.— For the unordered pairs of parental genotype scores (g_iF, g_iM)=(1, 0) and (g_iF, g_iM)=(0, 0), the table lists possible offspring genotype scores, and their frequency at the population level. The scoring of b_i and w_ij is illustrated for these cases by use of either parental genotypes or sibling genotypes only. Note that the frequency of these mating types is 4p³q and 4p²q², respectively.

Fulker et al. (1999) suggested that the β_w regression coefficient, when estimated in the context of a variance-components model and in the absence of population admixture, provides a direct estimate of the additive genetic value a. Allowing for population admixture and extending the model to allow both for sibships of any size and for the inclusion of parental genotype data, we derive the expectation of β_b and β_w for each of these two alternative definitions and show that all the “spurious” association between genotype score and phenotype is accounted for by β_b and that the β_w regression coefficient remains a direct estimate of the additive genetic value a (at the marker).

By use of the normal equations, equation (5) can be expressed as y=Xb+e, so that equation M2 Xy, where X is the design matrix and equation M3 . X′X and Xy are asymptotically the covariance matrices between the independent and dependent variables, respectively. To solve these equations, only the expectations for the variances, V_b and V_w, and covariances, C_b,y and C_w,y, are required, since b_i and w_ij are orthogonal by design. These quantities are derived in Appendices B and C.

Go to:

Parental Genotypes Available

As shown in Appendix B,

and

so that

Note that all the population-substructure effects in the means, μ_i, that are apparent in the general specification of E(β_a) are included in the expectation of β_b exclusively. Whereas β_b provides a direct estimate of a only if there is no “spurious” association between g and y (e.g., when all μ_i are 0 or, since Σ_in_iμ_i=0, when p and q are constant), β_w is independent of any “spurious” effects and remains a valid estimate of a even when there is admixture.

Go to:

Parental Genotypes Unobserved or Unused

As shown in Appendix C,

and

so that, as equation M4 , the contribution of each family to V_b, V_w, C_b,y, and C_w,y approximates that of the case in which parents are available (given in the previous section). On solution of the normal equations,

so that β_w remains a direct estimate of a, independent of population admixture and the number of offspring observed in each family. In contrast, β_b is again a direct estimate of a only when there is no “spurious” association between genotype scores, g, and phenotype, y. These expectations show that the admixture test of β_b=β_w proposed by Fulker et al. (1999) is valid even when parental genotypes are included in analysis and when sibships of size >2 are evaluated.

Go to:

Simulations

A number of simulation studies were conducted to explore the properties of this orthogonal decomposition of genotype scores as a test of linkage disequilibrium. Data were simulated in nuclear families each having one to eight offspring. Trait values were constructed as the sum of a major-gene effect (with variance σ²_a) generated by an additive quantitative-trait locus (QTL), Q, having two equifrequent alleles, a residual sibling correlation (σ²_s), and an environmental effect (σ²_e), each assigned independently from a normal distribution with mean 0. Except where noted, a diallelic marker locus M, with allele frequencies p=q= 1/2, was simulated with a recombination fraction (θ) of 0. For convenience, the total trait variance σ²=σ²_a+σ²_s+σ²_e was fixed at 100 in all simulations.

Linkage disequilibrium between the trait and marker locus was introduced in the parental chromosomes. When appropriate, disequilibrium was modeled in the usual fashion as D=p_M₁Q₁-p_M₁p_Q₁ (p_M₁Q₁ is the frequency of the haplotype with alleles M₁ and Q₁, and p_M₁ and p_Q₁ are the frequencies of alleles M₁ and Q₁; Lewontin and Kojima 1960), so that D_max=min(p_M₁,p_Q₁)-p_M₁p_Q₁, and the standardized disequilibrium coefficient is D/D_max.

Where noted, population admixture was generated by the mixing of families drawn from one of two populations (A and B) with different phenotypic means (μ_A and μ_B) and marker allele frequencies (p_A=.7 and p_B=.3) in equal sampling proportions. μ_A and μ_B were selected such that admixture accounted for 20% of the total phenotypic variance in the combined population; that is, equation M5

Except where noted, parental genotypes were used to estimate π by use of information available from the single-marker locus (Haseman and Elston 1972). By use of the variance-components model (eq. [2]) and the orthogonal model for the means (eq. [5]), the likelihood was maximized under the null (β_w=0) and alternative hypotheses, to calculate L₀ and L₁, respectively (eq. [4]). As suggested by Searle et al. (1992), the variance-component estimates were constrained to be positive. To examine the benefits of parental information, each simulated data set was examined under the two alternative definitions of b_i and w_ij (see eq. [6]). Only families having at least one heterozygous parent (when parental genotypes were used) or at least two different types of offspring (when parental genotypes were ignored) were included in the likelihood calculations, since other families do not contribute to estimates of β_w. For the purpose of our simulations, power and error rates were defined as the proportion of simulations exceeding nominal significance levels for the χ² distribution under the likelihood-ratio criteria.

It is well known that segregation of a major locus violates the multivariate normality assumption and that maximum-likelihood estimates of the variance components can be biased in small samples (Amos et al. 1996). To characterize the effect that that bias has on the present test, error rates of β_w were examined for a range of sample sizes (i.e., 120–1,920 offspring) and family structures (1–8 children), in the following situations: (1) no sibling correlation or major-gene effect (σ²_s=σ²_a=0) in homogeneous and admixed populations, (2) a large major-gene effect (σ²_a=30) or a large residual sibling correlation (σ²_s=50), and (3) both residual sibling correlation and major-gene effect (σ²_a=20, σ²_s=30). Five thousand simulated data sets were examined in each test case. No disequilibrium was modeled in any of these simulations.

The effects that family structure and linkage disequilibrium have on power were examined in a sample of 480 total offspring when σ²_a=10 and σ²_s=30, by varying D between 0 and D_max and varying, from one to eight, the number of offspring in each family. In these assessments, the total number of offspring sampled was fixed, so that the number of families varied according to sibship size. Finally, the sensitivity of the test to linkage disequilibrium was estimated for a variety of sample sizes and family structures. Sensitivity was defined as the most stringent significance level that could be obtained with 80% power in simulated data sets in which the trait and marker loci were identical. When power and test sensitivity were examined, 1,000 simulated data sets were analyzed in each case.

Go to:

Permutation Test

For each family, the vector w_i denotes the observed pattern of allelic transmission. In the absence of linkage disequilibrium, the vectors w_i and −w_i are equally likely, as long as there is no segregation distortion. Construct a random permutation of any set of K families by replacing each w_i with either itself or −w_i, with equal probability, so that, for any given data set, there are 2^K different permutations of the data. The distribution of the maximum likelihood of the data, L₁, under the hypothesis of no linkage disequilibrium is the distribution of the maximum likelihoods of these 2^K permuted data sets. When the distribution of 2[ln(L₁)-ln(L₀)] is not well approximated by the χ² distribution, the distribution of L₁ can be estimated by a sampling of a large number of permutations and their respective likelihoods. These empirical P values and the likelihood-ratio criterion were compared in a small sample (120 total offspring) in which the major-gene effect was introduced by simulation of a dominant QTL with equally frequent alleles. In this situation, empirical significance levels should be desirable because both dominance (which both induces nonnormality into the trait distribution and makes the variances model [eq. {2}] incomplete) and the small sample size are expected to reduce the accuracy of asymptotic significance levels. For dichotomous traits, the increased error rates of likelihood-ratio tests in small samples, as well as their spurious effect on power, have been described by Whittaker and Thompson (1999). The rationale for permutation tests in linkage studies has been discussed by Wan et al. (1997).

Go to:

Type 1–Error Rates

Error rates in estimates of the β_w parameter in various test cases are presented in tables tables22 and and3.3. Additional family structures were examined, but the results were intermediate between those tabulated and are not shown. A large major gene and sibling resemblance were selected for description of error rates, to make any possible biases obvious. When one child from each family was considered, error rates were not influenced by population admixture or by the effects of the linked major locus or additional sibling resemblance, so that only a summary error rate over all test cases is reported.

Table 2

Error Rates When Parents Are Available

	Error Rate^b When Total No. of Offspring Is
Offspring per Family and Test Case^a	120	240	480	960	1,920
1:
Overall	5.5	5.3	5.4	5.1	5.1
2:
Null	5.4	5.0	5.3	4.8	5.0
Admixture	5.6	5.4	5.4	4.6	5.7
Sibling resemblance	4.8	5.0	4.7	4.4	5.2
Linkage	7.0	5.4	5.7	5.0	5.3
Composite	5.6	4.8	4.9	5.4	5.1
4:
Null	5.1	5.1	4.9	4.5	4.2
Admixture	5.0	4.9	4.5	5.2	4.7
Sibling resemblance	4.8	5.0	4.7	4.4	4.8
Linkage	7.0	5.4	5.7	5.0	5.2
Composite	5.6	4.8	4.9	5.4	5.3
8:
Null	4.9	5.3	5.7	4.8	5.5
Admixture	4.5	4.4	4.9	4.2	5.2
Sibling resemblance	4.4	4.0	4.5	4.7	4.8
Linkage	6.5	6.2	5.9	5.8	6.1
Composite	5.2	5.3	5.0	5.2	4.7

^aNull = no sibling resemblance, or major-gene effect (s²=h²=0); Admixture = no sibling resemblance, or major-gene effect (s²=h²=0, with population admixture); Sibling resemblance = large sibling resemblance (s²=.5); Linkage = linked major gene of large effect (h²=.3, θ=0); Composite = composite test case (s²=.3 and h²=.2, θ=0). There was no linkage disequilibrium between QTL and marker locus (D=0). For single-child families, only the overall error rate is reported. Each estimate is based on 5,000 replicates.

^bShown as the proportion of simulations exceeding the nominal significance level, .05.

Table 3

Error Rates When Parents Are Unavailable or Unused^[Note]

	Error Rate^b When Total No. of Offspring Is
Offspring per Familyand Test Case^a	120	240	480	960	1,920
2:
Null	6.5	6.1	5.7	5.3	5.1
Admixture	6.6	6.0	4.6	4.2	4.8
Sibling resemblance	5.4	5.5	5.5	4.7	5.1
Linkage	7.1	5.7	6.0	5.7	5.3
Composite	5.7	5.2	5.2	5.2	4.9
4:
Null	5.3	5.4	4.5	4.6	4.8
Admixture	5.1	4.1	4.4	4.7	4.6
Sibling resemblance	4.3	4.4	4.6	4.6	4.7
Linkage	6.2	6.1	5.7	5.7	5.3
Composite	5.5	6.1	5.5	5.2	4.8
8:
Null	5.2	4.9	5.4	4.8	5.2
Admixture	4.2	4.2	4.6	4.2	5.0
Sibling resemblance	3.5	4.3	4.8	3.9	4.8
Linkage	7.7	6.9	5.9	5.6	5.5
Composite	6.1	5.7	5.2	5.2	5.4

Note.— The simulated data sets are the same as those used in table 1, but parental genotypes were not considered during the analysis.

^aTest cases are as defined in table 2.

^bShown as the proportion of simulations exceeding the nominal significance level, .05.

The error rates for the rows labeled “Null” in tables tables22 and and33 should be considered as baseline error rates for the other test cases. Not surprisingly, error rates were closer to nominal significance levels in larger samples, where the likelihood-ratio criterion is more accurate. For very large samples, the asymptotic criteria seem to be appropriate regardless of model or sibship size (tables (tables22 and and3).3). In smaller samples, error rates are slightly high for the linkage test case and are slightly low for the admixture and sibling-resemblance test cases, an effect that is more pronounced for larger sibships. When only a small number of observations are considered, estimates of the variance components may be biased (see Hopper and Mathews 1982; Amos et al. 1996), so that these error rates are not a specific feature of the present model but may reflect violations of multivariate normality.

As the sample size increases, bias in the maximum-likelihood estimates of variance components should be reduced, and the error rate of asymptotic significance tests should approach its nominal level (Amos et al. 1996). These small-sample biases decrease for more-realistic values of σ²_a and σ²_s. It is interesting to note that, although some bias remains when the variance components are estimated by maximum likelihood, these biases are much larger if only σ²_e is considered: for example, if the linkage information is ignored and a traditional least-squares regression framework is adopted, modeling only σ²_e and the association parameters, the error rates exceed 11% at the nominal .05 significance level, for all sample sizes examined in the eight-sib linkage-test case in tables tables22 and and33.

When parental genotypes are not available for analysis (table 3), it may appear counterintuitive that, although the biases follow the general trends described above, they appear to be greater in the two-sib case. The reason for this increased bias is that w_i is not 0 (and therefore informative) in only a small proportion of sib pairs when parental genotypes are not available for analysis, so that the effective sample size is much less than that actually genotyped.

Go to:

Power Estimates

Results of the power calculations are presented in table 4 and figure 1. When parental genotypes are available, power depends mostly on the amount of disequilibrium between the trait and marker loci and is largely independent of the number of offspring in each family (table 4). In contrast, when parental genotypes are not available, power depends both on family size and on the level of disequilibrium: larger sibships allow a greater proportion of segregating alleles to be identified (and scored to be not 0 in the respective w_ij), and, thus, power increases with the number of siblings in each family. As might be expected, for any family size, power is always greater when parental genotypes are available for analysis, since all pairs of segregating alleles are evident in this situation. However, for larger sibships (four or more offspring), it appears that [less-than-or-eq, slant] 5% power is lost when parental genotypes are unavailable.

An external file that holds a picture, illustration, etc.
Object name is AJHGv66p279fg1.jpg

Figure 1

Sensitivity of the orthogonal test to association. Sensitivity is defined as equation M7 , where α is the significance level exceeded by 80% of simulated data sets. The total number of offspring varied between 240 and 1,920 (in increments of 240 children). Results were plotted for sib-pair families in which parental genotypes were available for analysis (squares) and for sib-pair (diamonds), sib-triad (triangles), and sib-quad (circles) families in which parental genotypes were not available for analysis. The proportion of phenotypic variance attributable to residual sibling resemblance (s²) was .30. The major-gene effect (h²) was .10 in panel A, .05 in panel B, and .025 in panel C, Each plotted data point corresponds to 1,000 simulated data sets. For convenience, a least-squares straight line has been plotted through each set of data points.

Table 4

Effects of Family Structure and Linkage Disequilibrium on Power^[Note]

D/D_max^a(%)	One Sib per Family	Two Sibsper Family	Three Sibs per Family	Four Sibsper Family	Five Sibs per Family	Six Sibsper Family	Eight Sibs per Family
	No. of Families When Total No. of Sibs Is 480
	480	240	160	120	96	80	60
	Power When Parental Genotypes Are Included in Analysis, for E(N_w) = 360^b
0	.2	.0	.1	.1	.2	.1	.0	.01
25	2.1	1.8	2.0	2.6	3.0	2.1	2.1	1.11
50	19.5	22.9	24.8	26.7	26.7	27.2	23.9	2.24
75	69.3	72.6	74.2	76.9	76.0	76.5	75.4	3.35
100	97.4	97.7	98.3	98.4	98.2	98.4	98.5	4.46
	Power When Parental Genotypes Are Not Included in Analysis, for E(N_w) =^b
	…	195	281	322	341	350	358
0	…	.1	.1	.2	.2	.0	.0	.02
25	…	1.2	2.3	2.7	2.3	1.6	1.8	1.11
50	…	12.8	18.7	22.6	23.5	25.2	23.1	2.25
75	…	44.3	64.1	69.5	71.5	73.5	73.9	3.34
100	…	83.3	94.6	97.3	97.7	98.0	98.3	4.45

Note.— Each simulated data set consisted of 480 total offspring, so that the total number of families decreased with increasing sibship size.

^aBetween QTL and marker locus (θ=0). The total trait variance was set at 100, so that the true additive genetic value at the QTL was 4.47.

^bPower is shown as the proportion of simulations exceeding the nominal significance level, .001. The simulation parameters specified a QTL locus with h²=.1 and a residual sibling correlation s²=.3. N_w = effective sample size, which corresponds to offspring in families in which allelic transmission could be scored.

When D/D_max>50%, it is possible to achieve considerable power for loci accounting for 10% of the phenotypic variance, and estimates of β_w are unbiased and essentially exact when there is complete disequilibrium (table 4). The relationship between the apparent additive genetic value of the marker locus, β_w, the disequilibrium coefficient, D, and additive genetic effect at the QTL is very close to a_marker=a_QTLD/p_QTLq_QTL (Falconer 1989; Fulker et al. 1999). This implies that the proportion of variance explained by the association parameter is a function of the squared disequilibrium coefficient D² and the additive genetic variance of the trait locus, so that it is not surprising that power is very sensitive to D. When D=D_max and the QTL and marker-allele frequencies are the same, all of the genetic variance attributable to the QTL will be encompassed in the allelic-association parameter, and estimates of the residual additive genetic variance (σ²_a) will equal 0.

Figure 1 shows the most stringent significance level, α, that can be applied when 80% power is achieved, as a function of both the total number of offspring sampled and the additive genetic variance of the QTL, when the trait locus is observed. As noted previously, when parental genotypes are available, power does not depend greatly on the number of offspring in each family, so that only the sib-pair case has been plotted. When parents are unavailable, it is clear that achievable significance levels rapidly approach that of the case in which parents are available, as the number of children in each family increases. When these plots are used, an arbitrary significance level may be selected for a given sample size and study design. Alternatively, for a desired number of independent tests and correspondingly adjusted significance level, an appropriate sample size and study design may be selected. For example, for the hypothetical genome screens proposed by Risch and Merikangas (1996), α=5×10^-8 and equation M6 , so that, as can be seen in figure 1A, either ~350 sib pairs with parents (700 total offspring but 1,400 genotypes) or ~500 sib pairs without parental information (1,000 total offspring/genotypes) or ~260 sib trios without parents (800 total offspring/genotypes) are needed for 80% power and a locus that accounts for 10% of phenotypic variance. It is noteworthy that the total number of genotypes required is smaller when parental information is not used. In practice, the marker locus and trait locus will often not be identical, so these represent best-case estimates of power.

Go to:

Evaluation of the Permutation Test

To assess the performance of the permutation test, a series of simulations was undertaken in which dominance variance was introduced to skew the phenotypic distribution and to violate multivariate normality. In an attempt to introduce further departures from asymptotic expectations, we omitted the appropriate dominance-variance parameter from the variance-components portion of the model. The results of these simulations are presented in table 5. The performance of the empirical permutation test was examined in 5,000 samples of 120 offspring (in families with 1 to 8 children) when a dominant major gene (H²=.3, s²=.3) was segregating (θ = 0). When D=0, the overall error rate at the .05 significance level was .06 when asymptotic expectations were used but was .05 when empirical significance levels were calculated from 1,000 permutations of each data set. When D=D_max, power at the .05 significance level was 56% for asymptotic significance levels but was 52% when empirical significance levels were calculated from 1,000 permutations of each data set. For larger samples in which the model is well specified, such as those in table 4, we find no significant advantages in this permutation test.

Table 5

Error Rates When the Variances Model for Ω_ijk Is Inappropriate

	Error Rate when Total No. of Offspring Is^a
Offspring per Family	120	240	480	960	1,920
When Parental Genotypes Are Available
1	5.9	5.6	5.2	5.1	5.2
2	6.3	5.2	5.5	5.3	5.8
4	6.3	5.2	5.5	5.3	5.6
8	6.9	6.6	6.2	5.9	5.5
When Parental Genotypes Are Not Available
2	7.0	5.7	5.8	5.2	5.7
4	6.5	6.6	6.1	6.3	6.2
8	8.6	7.0	6.2	6.1	5.5

^aProportion of simulations exceeding the nominal .05 significance level when there is a linked dominant major gene (θ=0, h²=.3) with equally frequent alleles. The model for variances in Ω_j did not include a dominance variance component, so that in larger families the error rate is high. This major gene also introduced skewness in the phenotype distribution, violating multivariate normality.

Go to:

Discussion

The orthogonal model that we propose is a generalization of the one described by Fulker et al. (1999) for sib-pair data. It allows for the optional inclusion of parental data, which greatly increases power, and for the analysis of larger sibships, in which identification of segregating alleles is more efficient. For large sample sizes or when empirical significance levels are calculated by the permutation method described, the test is robust to a variety of biases, including linkage, background familiality, and population stratification. Also, when parental data are used, the test of the within-family–association parameter β_w is asymptotically a test of E(wy)=0 and is equivalent to that described by Rabinowitz (1997) without the benefit of the variance-components framework. Other linear models, such as that proposed by Allison (1997), could be used with the variance-components approach, but the orthogonal model is attractive because it both provides direct estimates of the additive genetic value of the marker alleles and can be used when parental genotypes are unavailable. It is important to emphasize that the present approach treats linkage and association separately. Consequently, in contrast to other methods, which provide tests of disequilibrium only in minimal family configurations (Spielman et al. 1993; Allison 1997; Rabinowitz 1997; Allison et al. 1999), the orthogonal method does not detect linkage in the absence of disequilibrium in nuclear families of any configuration.

Although, in our simulations, power depended mostly on the major-gene–effect size and on the total number of offspring available for analysis, in practice it is undesirable to rely on a small number of large families, because they might represent very few alleles. However, moderate-size families (i.e., three to four sibs) might be more attractive than sib pairs, because they provide much greater power when parents are not available and require less genotyping effort when parents are available. The observation that, for multiplex families, the number of individuals that need to be genotyped in order to achieve comparable power is smaller when parental genotypes are not used is important in situations in which genotyping capacity is limited.

Obviously, power is very sensitive to disequilibrium, so that this test and other, related approaches are well suited for the analysis of dense maps. In practice, it may not be practical to use these dense maps for genome screens, but they can be used to follow up suggestive linkages that have been identified by allele-sharing methods on a more sparse map. When a dense marker map becomes available, it can be used to produce multipoint estimates of IBD. In the variance-components side of the model, better IBD estimates allow the fitted variance-covariance matrix to better approximate the true variances and covariances, improving the performance of the model, in terms of both power and error rates.

The model can be easily extended to allow for multiallelic markers with up to X alleles, by inclusion of a separate between- and within-family component for alleles 1 through X-1, and, in this situation, no changes to the variance model should be required. In other situations, it might be appropriate to define either dominance genotype scores or, when imprinting is suspected, separate paternal and maternal genotype scores. These alternative genotype scores can be decomposed into orthogonal components by taking either the sibling average or its asymptotic expectation derived from the parental genotypes. However, these modifications require either changes to the variance model or calculation of empirical P values, by analogous permutation tests.

Hopper and Mathews (1982) have described a number of methods for verifying that multivariate normality assumptions are not grossly violated. The permutation test that we describe should allow this orthogonal model to be applied in situations in which multivariate normality is violated—for example, when the sample size is small or the trait distribution has been skewed by selection (e.g., see Allison 1997) or when the model for variances may be inappropriate. However, asymptotic significance levels are still appropriate in most situations examined here and can be a useful tool in prescreening, to conserve computing resources.

Go to:

Acknowledgments

We wish to thank Dr. Dan Weeks for helpful comments on an early version of the manuscript. This work has been funded by the Wellcome Trust. G.R.A. is supported by a Wellcome Trust Prize Studentship. W.O.C.C. is a Wellcome Trust Senior Clinical Research Fellow. QTDT, a computer program implementing the orthogonal test of association described herein, is available at the Wellcome Trust Centre for Human Genetics Website.

Go to:

Appendix A: E(β_a) with Allowance for Population Admixture

Consider population admixture by defining μ_i and p_i and q_i as the phenotypic mean and the marker-allele frequencies for the population from which family i was drawn (allowing for up to K different subpopulations). Assume that within each subpopulation there is random mating and random transmission of parental alleles to offspring and that the total sample of N individuals is centered on mean 0, so that μ=Σn_iμ_i=0. In this situation,

and

so that, for model (3), when the standard expectations V_x=E(x²)-E(x)² and C_x,y=E(xy)-E(x)E(y) are used, for any x and y,

where

These expectations extend those of Cardon (in press).

Go to:

Appendix B: E(β_b) and E(β_w) with Use of Parental Genotypes

For the orthogonal model in equation (5), the expectations for V_b, V_w, C_b,y, and C_w,y are required for solution of the normal equations and to obtain expectations for the regression parameters β_w and β_b.

Let Σ_z denote the sum over all possible mating types z and note that E(y), E(g), E(g²), and E(gy) are as given in Appendix A. Then, when the mating type frequencies given by Haseman and Elston (1972) are used,

and

These are all the quantities required in order to determine

and

So, on inversion and multiplication,

Go to:

Appendix C: E(β_b) and E(β_w) with Use of Sibling Genotypes Only

When one parent is heterozygous, consider allelic transmission t~ binomial (n_i, 1/2), so that E(t²)=( 1/4)(n²_i+n_i). When the heterozygous and homozygous parents transmit the same allele t times, |b|=t/n_i. Thus, for any family i,

where C₁ denotes the condition that exactly one parent is heterozygous.

To calculate E(by), it is convenient to separate y into its orthogonal-mean, μ, and major-gene, ga, components. Recall that b_i=(1/n_i)Σ_jg_ij, so that, when one parent is heterozygous, the major-gene component is

When two parents are heterozygous, consider allelic transmission t^′~ binomial (2n_i, 1/2), so that E(t^′)=n_i and E(t^′2)=( 1/4)(4n²_i+2n_i). When allele 1 (or 2) is transmitted t^′ times, |b|=|t^′/n_i-1|. So, for any family i,

and the major-gene component of E(by) is

where C₂ denotes the condition that both parents are heterozygous.

When C₁ and C₂ are considered, together with E(y), E(g), E(g²), and E(gy) given in Appendix A, and the mating-type frequencies

and

and, when the component derivations of E(by) are used,

and

Thus,

and

so that

Go to:

Electronic-Database Information

The URL for data in this article is as follows:

The Wellcome Trust Centre for Human Genetics, http://well.ox.ac.uk (for QTDT computer program)

Go to:

References

Allison DB (1997) Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet 60:676–690 [Europe PMC free article] [Abstract]

Allison DB, Heo M, Kaplan N, Martin ER (1999) Sibling-based tests of linkage and association for quantitative trait. Am J Hum Genet 64:1754–1763 [Europe PMC free article] [Abstract] [Google Scholar]

Amos CI, Zhu DK, Boerwinkle E (1996) Assessing genetic linkage and association with robust components of variance approaches. Ann Hum Genet 60:143–160 [Abstract]

Cardon LR. A sib pair regression model of linkage disequilibrium for quantitative traits. Hum Hered (in press) [Abstract] [Google Scholar]

Chakravarti A (1998) It's raining SNPs, hallelujah? Nat Genet 19:216–217 [Abstract]

Collins FS, Guyer MS, Chakravarti A (1997) Variations on a theme: cataloging human DNA sequence variation. Science 278:1580–1581 [Abstract]

Curtis D (1997) Use of siblings as controls in case-control association studies. Ann Hum Genet 61:319–333 [Abstract]

Curtis D, Sham PC (1995) A note on the application of the transmission disequilibrium test when a parent is missing. Am J Hum Genet 56:811–812 [Europe PMC free article] [Abstract]

Falconer DS (1989) Introduction to quantitative genetics. Longman Scientific & Technical, London [Google Scholar]

Fulker DW, Cherny SS, Sham PC, Hewitt JK (1999) Combined linkage and association analysis for quantitative traits. Am J Hum Genet 64:259–267 [Europe PMC free article] [Abstract] [Google Scholar]

George VT, Elston RC (1987) Testing of association between polymorphic markers and quantitative traits in pedigrees. Genet Epidemiol 4:193–201 [Abstract]

Haseman JK, Elston RC (1972) The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 2:3–19 [Abstract]

Hopper JL, Mathews JD (1982) Extensions to multivariate normal models for pedigree analysis. Ann Hum Genet 46:373–383 [Abstract]

Kruglyak L (1997) The use of a genetic map of biallelic markers in linkage studies. Nat Genet 17:21–24 [Abstract]

Lander E (1999) Array of hope. Nat Genet 21:3–4 [Abstract]

Lewontin RC, Kojima K (1960) The evolutionary dynamics of complex polymorphisms. Evolution 14:450–472 [Google Scholar]

Martin ER, Kaplan NL, Weir BS (1997) Tests for linkage and association in nuclear families. Am J Hum Genet 61:439–448 [Europe PMC free article] [Abstract]

Rabinowitz D (1997) A transmission disequilibrium test for quantitative trait loci. Hum Hered 47:342–350 [Abstract]

Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517 [Abstract]

Searle SR, Casella G, McCulloch CE (1992) Variance components. In: Wiley series in probability and mathematical statistics. John Wiley & Sons, New York [Google Scholar]

Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516 [Europe PMC free article] [Abstract]

Wan Y, Cohen J, Guerra R (1997) A permutation test for the robust sib-pair linkage method. Ann Hum Genet 61:79–87 [Abstract]

Weinberg CR, Wilcox AJ, Lie RT (1998) A log-linear approach to case-parent-triad data: assessing the effects of genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 62:969–978 [Europe PMC free article] [Abstract] [Google Scholar]

Whittaker JC, Thompson DJ (1999) Finite-sample properties of family-based association tests. Am J Hum Genet 64:910–915 [Europe PMC free article] [Abstract]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

Full text links

Read article at publisher's site: https://doi.org/10.1086/302698

Read article for free, from open access legal sources, via Unpaywall: http://www.cell.com/article/S0002929707622538/pdf

Citations & impact

Impact metrics

817

Citations

Jump to Citations

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/33051319

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/33051319

Smart citations by scite.ai
Explore citation contexts and check if this article has been supported or disputed.
https://scite.ai/reports/10.1086/302698

Supporting

Mentioning

Contrasting

1044

Article citations

Interpreting population- and family-based genome-wide association studies in the presence of confounding.
Veller C, Coop GM
PLoS Biol, 22(4):e3002511, 11 Apr 2024
Cited by: 7 articles | PMID: 38603516 | PMCID: PMC11008796
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
A Non-Coding Fc Gamma Receptor Cis-Regulatory Variant within the 1q23 Gene Cluster Is Associated with Plasmodium falciparum Infection in Children Residing in Burkina Faso.
Cretin J, Adjemout M, Dieppois C, Gallardo F, Torres M, Merard Z, Sawadogo SA, Picard C, Rihet P, Paul P
Int J Mol Sci, 24(21):15711, 28 Oct 2023
Cited by: 0 articles | PMID: 37958695 | PMCID: PMC10650193
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Genetic Ancestry Estimates within Dutch Family Units and Across Genotyping Arrays: Insights from Empirical Analysis Using Two Estimation Methods.
Beck JJ, Ahmed T, Finnicum CT, Zwinderman K, Ehli EA, Boomsma DI, Hottenga JJ
Genes (Basel), 14(7):1497, 22 Jul 2023
Cited by: 0 articles | PMID: 37510400
Leveraging family data to design Mendelian randomization that is provably robust to population stratification.
LaPierre N, Fu B, Turnbull S, Eskin E, Sankararaman S
Genome Res, 33(7):1032-1041, 17 May 2023
Cited by: 3 articles | PMID: 37197991 | PMCID: PMC10538495
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Genetics of destemming in pepper: A step towards mechanical harvesting.
Hill T, Cassibba V, Joukhadar I, Tonnessen B, Havlik C, Ortega F, Sripolcharoen S, Visser BJ, Stoffel K, Thammapichai P, Garcia-Llanos A, Chen S, Hulse-Kemp A, Walker S, Van Deynze A
Front Genet, 14:1114832, 17 Mar 2023
Cited by: 1 article | PMID: 37007971 | PMCID: PMC10064014
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (817) article citations

Search life-sciences literature (44,835,121 articles, preprints and more)

A general test of association for quantitative traits in nuclear families.

Author information

Affiliations

Authors

ORCIDs linked to this article

Abstract

Free full text

A General Test of Association for Quantitative Traits in Nuclear Families

Summary

Introduction

Maximum-Likelihood Tests of Association

Population Admixture

Orthogonal Decomposition of the Genotype Scores

Table 1

Parental Genotypes Available

Parental Genotypes Unobserved or Unused

Simulations

Permutation Test

Type 1–Error Rates

Table 2

Table 3

Power Estimates

Table 4

Evaluation of the Permutation Test

Table 5

Discussion

Acknowledgments

Appendix A: E(βa) with Allowance for Population Admixture

Appendix B: E(βb) and E(βw) with Use of Parental Genotypes

Appendix C: E(βb) and E(βw) with Use of Sibling Genotypes Only

Electronic-Database Information

References

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Similar Articles

Partnerships & funding

Appendix A: E(β_a) with Allowance for Population Admixture

Appendix B: E(β_b) and E(β_w) with Use of Parental Genotypes

Appendix C: E(β_b) and E(β_w) with Use of Sibling Genotypes Only