xlink

Wei Xu, Meiling Hao, Yi Zhu

2019-08-20

1. Introduction

xlink from github is a package for the unified partial likelihood approach for X-chromosome association on time-to-event/ continuous/ binary outcomes. The expression of X-chromosome undergoes three possible biological processes: X-chromosome inactivation (XCI), escape of the X-chromosome inactivation (XCI-E), and skewed X-chromosome inactivation (XCI-S). Although these expressions are included in various predesigned genetic variation chip platforms, the X-chromosome has generally been excluded from the majority of genome-wide association studies analyses; this is most likely due to the lack of a standardized method in handling X-chromosomal genotype data. To analyze the X-linked genetic association for time-to-event outcomes with the actual process unknown, we propose a unified approach of maximizing the partial likelihood over all of the potential biological processes. The proposed method can be used to infer the true biological process and derive unbiased estimates of the genetic association parameters.

Reference:

“Xu, Wei, and Meiling Hao.”A unified partial likelihood approach for X‐chromosome association on time‐to‐event outcomes." Genetic epidemiology 42.1 (2018): 80-94." (via)

“Han, D., Hao, M., Qu, L., & Xu, W. (2019).”A novel model for the X-chromosome inactivation association on survival data." Statistical Methods in Medical Research." (via)

2. Installation

You can install xlink from [github]((https://github.com/qiuanzhu/xlink):

library("devtools")
install_github("qiuanzhu/xlink")

3. Examples

In the following examples, we choose the model is “survival” model, which could also applied to “linear” model for continuous response and “binary” for fitting logistic regression model.

3.1 Select significant SNPs from XCI or XCI-E model type:

In the sample data with 10 SNPs and 4 clinic covariates,

library("xlink")
head(Rdata)
ID OS OS_time gender Age Smoking Treatment snp_1 snp_2 snp_3 snp_4 snp_5 snp_6 snp_7 snp_8 snp_9 snp_10
1 0 0.0335 1 44.3 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0.0424 1 76.9 1 0 0 0 1 0 0 0 0 0 0 0
3 0 0.6435 1 53.7 0 1 0 0 0 0 1 0 0 0 1 0
4 1 0.3548 0 63.1 0 1 0 0 1 1 0 0 0 0 1 0
5 1 0.0306 0 29.2 0 0 0 1 0 1 1 0 0 0 0 0
6 1 0.3050 1 77.5 0 1 1 0 0 0 0 0 0 1 0 0

If the Model type is chosen to be XCI and threshold for MAF_v is set to be 0.05, the output for snp_1 with coefficient, P value and loglikelihood information

Covars<-c("Age","Smoking","Treatment")
SNPs<-c("snp_1","snp_2")
output<-xlink_fit(os="OS",ostime="OS_time",snps=SNPs,gender="gender",covars=Covars, option =list(type="XCI",MAF_v=0.05),model="survival",data = Rdata)
Hazard Ratio Confidence Interval (95%) P Value MAF
snp_1 1.6392 [1.3895,1.9339] 0.0000000 0.2062
gender 0.9529 [0.7557,1.2015] 0.6834041 NA
Age 1.0228 [1.0155,1.0301] 0.0000000 NA
Smoking 1.3001 [1.0172,1.6617] 0.0360414 NA
Treatment 1.2356 [0.9781,1.561] 0.0760446 NA
Baseline Full model Loglik ratio
-1493.808 -1478.774 15.03336

3.2 Select significant SNPs from all model type:

If the Model type is chosen to be all and threshold for MAF_v is set to be 0.1, the output for snp_1 with coefficient , P value and log-likelihood function information for XCI-E, XCI and XCI-S respectively,

Covars<-c("Age","Smoking","Treatment")
SNPs<-c("snp_1","snp_2")
output<-xlink_fit(os="OS",ostime="OS_time",snps=SNPs,gender="gender",covars=Covars, option =list(type="all",MAF_v=0.05),model="survival",data = Rdata)

For XCI-E model, snp_1 with coefficient, P value and log-likelihood function information

Hazard Ratio Confidence Interval (95%) P Value MAF
snp_1 1.8293 [1.4411,2.322] 0.0000007 0.2062
gender 1.0885 [0.8477,1.3978] 0.5061374 NA
Age 1.0232 [1.0157,1.0306] 0.0000000 NA
Smoking 1.3058 [1.0211,1.67] 0.0334504 NA
Treatment 1.2063 [0.9543,1.5249] 0.1167569 NA
Baseline Full model Loglik ratio
-1493.808 -1482.332 11.47576

For XCI model, snp_1 with coefficient, P value and log-likelihood function information

Hazard Ratio Confidence Interval (95%) P Value MAF
snp_1 1.6392 [1.3895,1.9339] 0.0000000 0.2062
gender 0.9529 [0.7557,1.2015] 0.6834041 NA
Age 1.0228 [1.0155,1.0301] 0.0000000 NA
Smoking 1.3001 [1.0172,1.6617] 0.0360414 NA
Treatment 1.2356 [0.9781,1.561] 0.0760446 NA
Baseline Full model Loglik ratio
-1493.808 -1478.774 15.03336

For XCI-S model, snp_1 with coefficient , log-likelihood function information and gamma estimation

Hazard Ratio Confidence Interval (95%) P Value MAF
snp_1 1.6596 [1.4031,1.9629] 0.0000000 0.2062
gender 0.9277 [0.7361,1.1692] 0.5250786 NA
Age 1.0228 [1.0155,1.0301] 0.0000000 NA
Smoking 1.2989 [1.0162,1.6602] 0.0367447 NA
Treatment 1.2374 [0.9795,1.5632] 0.0741144 NA
Baseline Full model Loglik ratio
-1493.808 -1478.709 15.09833
Gamma
0.8707407

The best model for snp_1 among model type XCI-E, XCI and XCI-S by using the AIC is

Best model by AIC
XCI

3.3 Output for the significant SNPs by P value:

By setting the threshold for pv_thold, the select output become

Covars<-c("Age","Smoking","Treatment")
SNPs<-c("snp_1","snp_2","snp_3")
result<-xlink_fit(os="OS",ostime ="OS_time",snps=SNPs,gender ="gender",covars=Covars, 
                   option =list(type="all",MAF_v=0.05), model="survival", data = Rdata)
select_output(input=result,pv_thold=10^-5)
SNP Hazard Ratio Confidence Interval (95%) P Value MAF Best model Gamma
snp_1 1.6392 [1.3895,1.9339] 0 0.2062 XCI NA
snp_3 1.5596 [1.3661,1.7805] 0 0.3638 XCI-S 1.538163