Back
 OJS  Vol.11 No.5 , October 2021
Transmission Based Conditional Logistic Model for Testing Main and Interaction Effects
Abstract: Transmission disequilibrium test (TDT) is a popular family based genetic association method. Under multiplicative assumption, a conditional logistic regression for matched pair, affected offspring with allele transmitted from parents and pseudo-offspring (control) with allele non-transmitted from parents, was built to detect the main effects of genes and gene-covariate interactions. When there exist genotype uncertainties, expectation-maximization (EM) algorithm was adopted to estimate the coefficients. The transmission model was applied to detect the association between M235T polymorphism in AGT gene and essential hypertension (ESH). Most of parents are not available in the 126 families from HongKong Chinese population. The results showed M235T is associated with hypertension and there is interaction between M235T and the case’s sex. The allele T is higher risk for male than female.

1. Introduction

To avoid false positive results because of confounding, some genetic association methods based on pedigree were proposed. Transmission disequilibrium test (TDT) introduced by Spielman et al. [1] is a family-based test. Only trios, including parents and one affected offspring, are needed in TDT. TDT were generalized for multi-allelic markers [2] [3].

For many diseases, especially those of late onset age, parental information is not available, and the classical TDT for triad data cannot implement. Some methods for incomplete families were proposed, e.g. S-TDT [4] and Sibass [5] for siblings, and PDT (pedigree disequilibrium test) for general pedigrees [6].

Single nucleotide polymorphisms (SNPs) are highly abundant, stable genetic markers in humans. TDT methods for haplotype transmission using multiple tightly linked loci were proposed [7] [8]. In these approaches, the individuals’ underlying haplotypes must be reconstructed using observed genotypes even if there are no missing genotype data.

Genotype relative risk may vary across levels of environmental exposure. That is to say, there maybe exist interactions between gene and covariates. Taub et al. [9] derive an extension of genotypic TDT to assess gene-environment interactions for binary environmental variables. However, joint effects of genotype and exposure or environmental covariates were not considered in classical allelic TDT. In the present paper, under conditional logistic regression structure, an allele or haplotype transmission based model is built to detect and assess main effects and gene-environment interactions.

2. Method

2.1. Transmission Based Model

Let H denote the number of allele or haplotypes for one or multiple tightly linked loci. The collection of all possible H ( H + 1 ) / 2 genotypes is G = { 1 / 1 , 1 / 2 , , 1 / H , 2 / 2 , 2 / 3 , , 2 / H , , ( H 1 ) / H , H / H } .

For an affected offspring with genotype g and covariate vector X = ( X 1 , X 2 , , X k ) , the joint effects of gene and covariates are considered in the genotype risk relative to reference genotype g 0 = ( H / H ) , i.e.

R ( g | X ) = P ( A | g , X ) / P ( A | g 0 , X ) (1)

where A is being affected. Let g f , g m , g c denote the genotypes for father, mother and the affected offspring respectively. Under some conditional independence, P ( g c | g f , g m , X ) = P ( g c | g f , g m ) and P ( A | g c , g f , g m , X ) = P ( A | g c , X ) , then

P ( g c | g f , g m , A , X ) = P ( g c , g f , g m , A | X ) g G c P ( g , g f , g m , A | X ) = P ( g f , g m | X ) P ( g c | g f , g m , X ) P ( A | g c , g f , g m , X ) g G c P ( g f , g m | X ) P ( g | g f , g m , X ) P ( A | g , g f , g m , X ) = P ( g c | g f , g m ) P ( A | g c , g f , g m , X ) g G c P ( g | g f , g m ) P ( A | g , g f , g m , X ) = P ( A | g c , X ) g G c P ( A | g , X ) = R ( g c | X ) g G c R ( g | X ) , (2)

where G c is the collection of all possible genotypes of a child given both parents’ genotypes. Let ( g f , g m ) = ( i / j , k / l ) . Then

G c = { { i / k , i / l , j / k , j / l } , i j , k l , { i / k , i / l } , i = j , k l , { i / k , j / k } , i j , k = l , { i / k } , i = j , k = l . (3)

Suppose that the genotype relative risk satisfies robust multiplicative model

R 2 ( i / j | X ) = R ( i / i | X ) R ( j / j | X ) (4)

and R ( i / i | X ) = exp ( 2 α i + 2 β i X ) with β i = ( β i 1 , β i 2 , , β i k ) , and then

R ( i / j | X ) = exp ( α i + β i X ) exp ( α j + β j X ) . (5)

Therefore, both parents’ transmission probability

P ( i , k | i / j , k / l , A , X ) = P ( g c = i / k | g f = i / j , g m = k / l , A , X ) = R ( i / k | X ) R ( i / k | X ) + δ k l R ( i / l | X ) + δ i j R ( j / k | X ) + δ i j δ k l R ( j / l | X ) = exp ( α i + β i X ) exp ( α i + β i X ) + ( 1 δ i j ) exp ( α j + β j X ) exp ( α k + β k X ) exp ( α k + β k X ) + ( 1 δ k l ) exp ( α k + β k X ) , (6)

where

δ i j = { 1 , i j , 0 , i = j . δ k l = { 1 , k l , 0 , k = l . (7)

Equation (6) means paternal transmission and maternal transmission are independent, and transmission probability for a parent (father or mother) with genotype i / j is

P ( i | i / j , A , X ) = exp ( α i + β i X ) exp ( α i + β i X ) + ( 1 δ i j ) exp ( α j + β j X ) . (8)

For a homozygous parent, transmission probability (8) is 1. For a heterozygous parent with genotype i / j ( i j ) , we introduce dummy variables

Z h c = { 1 , allele h is tranmitted, 0 , otherwise . Z h p c = { 1 , allele h is non-tranmitted, 0 , otherwise . (9)

Then a heterozygous parent with genotype i / j ( i j ) is just

P ( i | i / j , A , X ) = exp [ h = 1 H 1 ( α h Z h c + β h Z h c X ) ] exp [ h = 1 H 1 ( α h Z h c + β h Z h c X ) ] + exp [ h = 1 H 1 ( α h Z h p c + β h Z h p c X ) ] (10)

Equation (10) can be regarded as conditional logistic model for n h e t matched pairs, where n h e t is the number of heterozygous parents. The homozygous parents are excluded because of no contribution to likelihood. In such matched data, the affected offspring with predictors ( Z 1 c , Z 2 c , , Z H c , X ) was taken as case, the pseudo offspring with non-transmitted genotype with predictors ( Z 1 p c , Z 2 p c , , Z H p c , X ) was taken as matched controls. The parameters α i and β i ( i = 1 , 2 , H 1 ) measure the main effects of alleles and gene-covariate interaction effects. However, effects of covariates X cannot be included since there is no difference between the X values of the case and matched control.

The maximum likelihood estimates (MLEs) of the parameters can be given via standard conditional logistic model or stratified proportional hazard Cox model, such as PHREG (proportional hazard regression) procedure in statistical software SAS, or coxph in R package “survival”.

2.2. EM Algorithm for Dealing Ambiguities in Allele Transmission

Haplotype phase is often uncertain for linked multi-locus genotype. There may be several haplotype pairs compatible with observed genotype. In addition, even when only one locus is considered, there might be missing parental genotypes, especially for late-onset diseases. Therefore, there are ambiguities to decide which allele or haplotype is transmitted from the parent.

Suppose there are N parents-case trios, and then there are 2N parents in all. The genotypes the r-th parent and his/her offspring are denoted by g r , g r c , covariates vector for the offspring is X r = ( X r 1 , X r 2 , , X r k ) . The log-likelihood

ln L ( α , β ) = r = 1 2 N ln { ( i r , j r ) G ˜ r H P ( i r | i r / j r , A , X r ) } = r = 1 2 N ln [ ( i r , j r ) G ˜ r exp ( α i r + β i r X r ) exp ( α i r + β i r X r ) + ( 1 δ i j ) exp ( α j r + β j r X r ) ] , (11)

where G ˜ r is the set of haplotype groups { i r , j r } which haplotype pair { i r , j r } is compatible with parent genotype g r .

It is difficult to find the MLEs of parameter ( α , β ) directly. However, if we take underling haplotype pairs as “missing data” in Expectation-maximization (EM) algorithm, an iterative procedure can be provided to find the MLE. Given the current estimate, the expected complete-data log-likelihood in E (expectation) step is given by

Q ( α , β | α ( t ) , β ( t ) ) = r = 1 2 N { i , j = 1 H ω r ( t ) ( i , j ) ln P ( i | i / j , A , X r ) } , (12)

where

ω r ( t ) ( i , j ) = { F i F j exp ( α i ( t ) + β i ( t ) X r ) ( k , l ) G ˜ r F k F l exp ( α k ( t ) + β l ( t ) X r ) , ( i , j ) G ˜ r , 0 , ( i , j ) G ˜ r . (13)

Q ( α , β | α ( t ) , β ( t ) ) can be regarded as the log-likelihood for a weighted conditional logistic model for matched case and controls. However, haplotype frequencies ( F 1 , F 2 , , F H ) are often unknown and must be estimated too. Therefore, starting with initial values ( α ( 0 ) , β ( 0 ) ) and ( F 1 ( 0 ) , F 2 ( 0 ) , , F H ( 0 ) ) , the (t + 1)-th iteration of EM algorithm consists of 2 steps.

Step 1: Calculate the weights ω r ( t ) ( i , j ) ( i , j = 1 , 2 , , H ) , and obtain MLE of ( α ( t + 1 ) , β ( t + 1 ) ) via weighted conditional logistic model for matched case-pseudocontrols.

Step 2: Update hapotype frequencies

F i ( t + 1 ) = r = 1 2 N u = 1 H ω r ( t ) ( u , i ) 2 N , i = 1 , 2 , , H . (14)

Likelihood ratio test (LRT) can be used to detect gene effect and gene-covariates interactions. Likelihood ratio tests can be used to select model or to test gene effect and gene-covariates interaction. For example, if we consider only one SNP and one covariate, we can construct three models, the null model in which α = β = 0 , the model without interaction in which β = 0 , and the full model with interaction. Then we can use Λ 1 = 2 ln L ( α ^ , 0 ) 2 ln L ( 0 , 0 ) to test gene effect and Λ 2 = 2 ln L ( α ^ , β ^ k ) 2 ln L ( α ^ , 0 ) to test gene-covariates interaction.

3. Application

Essential hypertension is a multi-factional disorder that is influenced by genetic and environmental factors. The angiotensinogen (AGT) gene of the renin-angiotensin system (RAS) has been considered important elements in blood pressure regulation. Some studies show the M/T polymorphism in exon 2 of the AGT gene at position 235 (M235T) has been related to essential hypertension with controversy in white Europeans [10] [11].

In our study, 126 families with at least one hypertensive sibling, a total of 434 siblings from Hong Kong Chinese population are included in the analysis. As shown in Table 1, 59.5% of the families had two or three siblings with a further 33.4% having four or five siblings, and parents are not available in most of the families (86.5%). The information of siblings is very useful to reduce the uncertainty of the transmission from parent with unknown genotype.

The AGE gene M235T and covariate gender are introduced into the proposed model. Give initial value ( α ( 0 ) , β ( 0 ) ) = ( 0 , 0 ) and F ( 0 ) = ( 0.5 , 0.5 ) and precision ε = 10−5. After the EM iterative procedure (shown in Section 2.2) stops, the MLEs for the parameters are α ^ = 1.2876 and β ^ = 0.7378 , where allele M is

Table 1. Nuclear families in the analyse.

reference allele. To detect the effect of M235T and interaction effect of M235T*sex, we perform likelihood ratio test (LRT) with statistic Λ 1 = 2 ln L ( α ^ , 0 ) 2 ln L ( 0 , 0 ) and Λ 2 = 2 ln L ( α ^ , β ^ ) 2 ln L ( α ^ , 0 ) , respectively. The log-likelihoods with ln L ( α ^ , β ^ ) = 85.477 , ln L ( α ^ , 0 ) = 87.471 , and ln L ( 0 , 0 ) = 100.937 yield Λ 1 = 26.932 (p < 0.001) and Λ 2 = 3.987 (p = 0.046).

The results show that M235T is association with hypertension and there is interaction between M235T and gender. The relative risk for allele T is exp ( α ^ ) = 3.624 for male and exp ( α ^ + β ^ ) = 1.732 for female. This finding overlaps with several other association reports about gene-by-sex interaction of insulin-related traits and demonstrates the importance of considering interactions in the search for related genes [12] [13].

4. Discussions

Gene-covariates interactions are considered in allele/haplotype relative risk, and furthermore, in transmission probability in this transmission model. The missing parental genotypes and multiple tightly linked loci are allowed. For missing genotype or multi-locus genotype data, the underlying haplotypes or alleles are looked as missing data; the weighted conditional logistic models are given via EM algorithm.

As an application, 126 nuclear family data from Hong Kong Chinese population are used in haplotype-based model to detect the association between M235T in angiotensinogen gene and essential hypertension. The results suggest that the 235T is a risk allele with essential hypertension (ESH) for HongKong Chinese people, and contributes to higher risk in ESH men than in women. The 235T allele was more preferentially transmitted from heterozygous parents to ESH male patients than to female patients.

Acknowledgements

This work was supported by Guangdong Basic and Applied Basic Research Foundation (2020B1515310007), and Guangdong Province Key Laboratory of Computational Science, Sun Yat-sen University (2020B1212060032).

Cite this paper: Li, C. and Li, P. (2021) Transmission Based Conditional Logistic Model for Testing Main and Interaction Effects. Open Journal of Statistics, 11, 713-719. doi: 10.4236/ojs.2021.115042.
References

[1]   Spielman, R.S., Mcginnis, R.E. and Ewens, W.J. (1993) Transmission Test for Linkage Disequilibrium: The Insulin Gene Region and Insulin-Dependent Diabetes Mellitus (IDDM). The American Journal of Human Genetics, 52, 506-516.

[2]   Spielman, R.S. and Ewens, W.J. (1996) The TDT and Other Family-Based Tests for Linkage Disequilibrium and Association. American Journal of Human Genetics, 59, 983-989.

[3]   Sham, P. and Curtis, D. (2012) An Extended Transmission/Disequilibrium Test (TDT) for Multi-Allele Marker Loci. Annals of Human Genetics, 59, 323-336.
https://doi.org/10.1111/j.1469-1809.1995.tb00751.x

[4]   Spielman, R.S. and Ewens, W.J. (1998) A Sibship Test for Linkage in the Presence of Association: The Sib Transmission/Disequilibrium Test. The American Journal of Human Genetics, 62, 450-458.
https://doi.org/10.1086/301714

[5]   Curtis, D. (1998) Use of Siblings as Controls in Case-Control Association Studies. Annals of Human Genetics, 61, 319-333.
https://doi.org/10.1017/S000348009700626X

[6]   Martin, E.R., Monks, S.A., Warren, L.L. and Kaplan, N.L. (2000) A Test for Linkage and Association in General Pedigrees: The Pedigree Disequilibrium Test. GeneScreen, 1, 65-67.
https://doi.org/10.1086/302957

[7]   Clayton, D. (1999) A Generalization of the Transmission/Disequilibrium Test for Uncertain-Haplotype Transmission. The American Journal of Human Genetics, 65, 1170-1177.
https://doi.org/10.1086/302577

[8]   Zhao, H.Y., Zhang, S.L., et al. (2000) Transmission/Disequilibrium Tests Using Multiple Tightly Linked Markers. The American Journal of Human Genetics, 67, 936-946.
https://doi.org/10.1086/303073

[9]   Taub, M.A., Schwender H., Beaty T.H., Louis T.A. and Ruczinski, I. (2012) Incorporating Genotype Uncertainties into the Genotypic TDT for Main Effects and Gene-Environment Interactions. Genetic Epidemiology, 36, 225-234.
https://doi.org/10.1002/gepi.21615

[10]   Jeunemaitre, X., Inoue, I., Williams, C., Charru, A., Tichet, J., Powers, M., et al. (1997) Haplotypes of Angiotensinogen in Essential Hypertension. American Journal of Human Genetics, 60, 1448–1460.
https://doi.org/10.1086/515452

[11]   Barley, J., Blackwood, A., Sagnella, G., Markandu, N. and Carter, N. (1994) Angiotensinogen met235-->thr Polymorphism in a London Normotensive and Hypertensive Black and White Population. Journal of Human Hypertension, 8, 639-640.

[12]   Freire, M., Ji, L., Onuma, T., Orban, T., Warram, J.H. and Krolewski, A.S. (1998) Gender-Specific Association of m235t Polymorphism in Angiotensinogen Gene and Diabetic Nephropathy in NIDDM. Hypertension, 31, 896-899.
https://doi.org/10.1161/01.HYP.31.4.896

[13]   North, K.E., Franceschini, N., Borecki, I.B., Gu, C.C., Heiss, G., Province, M.A., et al. (2007) Genotype-by-Sex Interaction on Fasting Insulin Concentration: The Hypergen Study. Diabetes, 56, 137-142.
https://doi.org/10.2337/db06-0624

 
 
Top