小枫炮炮的博客分享 http://blog.sciencenet.cn/u/maplesword 呼吸着自由的空气,在计算生物学的海洋里畅泳

博文

[Paper Excerpt]Transcriptome genetics using second generation sequencing in a Ca

已有 3898 次阅读 2010-8-28 23:58 |个人分类:未分类|系统分类:论文交流| ngs, ASE

Outline:
I. Data
1) Data source: the authors sequenced the mRNA fraction of the transcriptome of lymphoblastoid cell lines from 60 CEU individuals using 37-bp paired-end Illumina sequencing. Each individual yielded 16.9+-5.9 million reads that mapped to the NCBI36 assembly of the human genome using MAQ. 86% of filtered reads mapped to known exons in Ensembl v54.
2) Data quantification: read counts for each individual were scaled to a theoretical yield of 10m reads and corrected for peak insert size across corresponding libraries. The authors developed a new method FluxCapacitor to map read into specific isoform. (Q: Can we use cufflink or scripture to do this?)
3) Quality evaluation: the authors campared whole-gene read counts to array intensities generated with Illumina HG-6 v2 microarrays.
4) Attempt to infer abundance values for exons that are not screened: with the same principle as using the correlation structure (LD) of genetic variants to impute variants from a reference to any population sample of interest.

II. SNP-Expression Association
1) Association of gene expression measured by RNA-Seq with genetic variation: see reference [22], Population genomics of human gene expression.
2) Evaluation of association: through permutation (see reference [23]), in exons, transcripts, and genes.
    * example of permutation provided by Xie Gangcai: after doing mapping, disarrange the nucleotides of each read and do mapping again to see how many reads are mapped as background or control, and then use some test to get a p-value to see significance of the original mapping.
3) A problem of RNA-Seq: RNA-Seq exon eQTLs have lower representation in low abundance genes indicating that rare transcripts are not well quantified at this level of coverage. (consistent with the other paper).
4) Replicate the eQTL discoveries: compared associations between this study and those obtained from sequencing the transcriptomes of an African population. The authors assessed the P-value distribution of matching CEU associations given the top associated SNP for 500 genes from the African population. ~33% of these signals were shared (P<0.0001 assessed by permutations).
5) Enrichment of eQTLs given an exon's location: eQTLs entiched around the TSS. The authors also identified increased number of discoveries for the first, second and last exon compared to any middle exons. When assessing the distribution of significant eQTLs around the 5' end of the exon of interest, the authors found that significant eQTLs when found associated with the last exon are closer to the last exon than any other exon.

III. Quantification of allele-specific-expression (ASE)
1) Transcriptome sequencing allows the quantification of ASE.
2) SNP to assess ASE: An average of 4000 heterozygote confirmed HapMap3 SNP positions per individual.
3) The proportion where both SNP alleles were detected: the authors assessed it as a function of mapping quality using SAMtools. 72% of heterozygote sites have both alleles detectable at least once with MAQ mapping quality 10, and the number slightly decreases with increasing mapping quality. 41% of the heterozygotes have more than 6 reads.
4) ASE assession: first corrected for reference to non-reference differential mapping for each library because of a tendency for the reference allele to be overrepresented in pileups over a heterozygote. With this frequency as the success rate when assessing the binomial probability of allele-specific expression, the authors tested for ASE.
5) Relationship between known eQTLs and ASE: first phasing double heterozygotes for both eQTL and ASE. As coverage increasing, the correlation between eQTL significance and ASE ratio improves; and then reads were summed across individuals to assess the one-sided ASE binomial P-value distribution with respect to eQTL phasing. For 0.01 and 0.001 significant eQTLs, the tail of the ASE P-value distribution was enriched, while for the exons without eQTLs, both tails of this distribution were enriched. (Q1: How to do the phasing? Q2: What is the one-sided ASE p-value mean? What is the test testing for?)
    * Phasing: for heterozygotes, phasing is to find out the alleles located on each chromosome. Unphased data - Genotype; Phased data - Haplotypes.
6) Relationship between rare eQTLs: the authors selected SNPs heterozygous in six or more individuals in exons without evidence for an eQTL, and examined patterns of haplotype homozygosity between individuals that shared a significant ASE signal (at P<0.05) with those that did not.
    * Haplotype homozygosity: the probability of selecting two identical haplotype at random from a population.

IV. Genetic basic of alternative splicing
1) The authors performed association between known variants affecting splicing signals with their respective genes and exons; in total, 963 variants for 788 genes were tested.
2) Stratification: splice variants were stratified in donor and acceptor variants and tested against abundance of exons 5' and 3' to the intron where they are residing. They found that donor associates with 5' exon more than the 3' one, while acceptor associates with 3' exons more than 5' exons.
3) Further assumption: if genetic variants are effecting transcript-specific expression, the authors should be able to detect heterogeneity in the transcript distribution found between chromosomes within an individual.
4) Measurement of degree to which genetics influences transcript-specific expression: look at insert size distribution of paired end reads over each heterozygote. Their expectation is that the heterogeneity of inserts sizes over significant ASE heterozygotes between each of their alleles would be increased relative to that between alleles of non-significant ASE heterozygotes (if one haplotype is increasing the expression of a particular transcript relative to the other allele, the insert size distribution would be changed). The heterozygotes with a minimum of 50 reads for both allels were tested, which include 901 positions. For each heterozygote for an individual, a bootstrapped Kolmogorov-Smirnov test was run for the respective insert size distribution (Q: Why use bootstrapped KS test instead of ordinary KS test?), and then the p-values were separated given the heterozygote was significant for ASE or not. Of the 901 heterozygotes, 235 were significant for ASE and 105 had significant transcript distribution heterogeneity; this corresponded to 72 genes which contained an ASE significant heterozygote.
5) Effect of genetic variants on events contribute to alternative isoforms (e.g. inclusion/exclusion of exons): derived from the authors' method, FluxCapacitor. Of 6600 quantified events, 110 are significant at the 0.01 permutation threshold.

V. Advanced Methods

https://blog.sciencenet.cn/blog-457844-357178.html

上一篇:[Paper Excerpt]Genome-wide allele-specific analysis: insights into regulatory va
下一篇:[Paper Excerpt]Genome-wide analysis of allelic expression imbalance in human pri
收藏 IP: .*| 热度|

0

发表评论 评论 (0 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-1 00:11

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部