欢迎浏览论文快速发表网,我们为你提供专业的论文发表咨询和论文写作指导。 [设为首页] [加入收藏]
社科类论文 科技类论文 医学类论文 管理类论文 教育类论文 农林类论文 新闻类论文 建筑类论文 文艺类论文 法学类论文
论文范文

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data
时间:2017-06-09 09:18   来源:未知   作者:admin   点击:
       Abstract:Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome and is becoming a standard approach in studying genetic variants in diseases. Understanding the exomes of individuals at single base resolution allows the identification of actionable mutations for disease treatment and management. WES technologies have shifted the bottleneck in experimental data production to computationally intensive informatics-based data analysis. Novel computational tools and methods have been developed to analyze and interpret WES data. Here, we review some of the current tools that are being used to analyze WES data. These tools range from the alignment of raw sequencing reads all the way to linking variants to actionable therapeutics. Strengths and weaknesses of each tool are discussed for the purpose of helping researchers make more informative decisions on selecting the best tools to analyze their WES data.
1. Introduction
      Recent advances in next-generation sequencing technologies provide revolutionary opportunities to characterize the genomic landscapes of individuals at single base resolution for identifying actionable mutations for disease treatment and management [1, 2]. Whole Exome Sequencing (WES) is the application of the next-generation technology to determine the variations in the exome, that is, all coding regions of known genes in a genome. For example, more than 85% of disease-causing mutations in Mendelian diseases are found in the exome, and WES provides an unbiased approach to detect these variants in the era of personalized and precision medicine. Next-generation sequencing technologies have shifted the bottleneck in experimental data production to computationally intensive informatics-based data analysis. For example, the Exome Aggregation Consortium (ExAC) has assembled and reanalyzed WES data of 60,706 unrelated individuals from various disease-specific and population genetic studies [3]. To gain insights in WES, novel computational algorithms and bioinformatics methods represent a critical component in modern biomedical research to analyze and interpret these massive datasets.
       Genomic studies that employ WES have increased over the years, and new bioinformatics methods and computational tools have developed to assist the analysis and interpretation of this data (Figure 1). The majority of WES computational tools are centered on the generation of a Variant Calling Format (VCF) file from raw sequencing data. Once the VCF files have been generated, further downstream analyses can be performed by other computational methods. Therefore, in this review we have classified bioinformatics methods and computational tools into Pre-VCF and Post-VCF categories. Pre-VCF workflows include tools for aligning the raw sequencing reads to a reference genome, variant detection, and annotation. Post-VCF workflows include methods for somatic mutation detection, pathway analysis, copy number alterations, INDEL identification, and driver prediction. Depending on the nature of the hypothesis, beyond VCF analysis can also include methods that link variants to clinical data as well as potential therapeutics (Figure 2).
       Computational tools developed to align raw sequencing data to an annotated VCF file have been well established. Most studies tend to follow workflows associated with GATK [4–6], SAMtools [7], or a combination of these. In general, workflows start with aligning WES reads to a reference genome and noting reads that vary. The most common of these variants are single nucleotide variants (SNVs) but also include insertions, deletions, and rearrangements. The location of these variants is used to annotate them to a specific gene. After annotation, the SNVs found can be compared to databases of SNVs found in other studies. This allows for the determination of frequency of a particular SNV in a given population. In some studies, such as those relating to cancer, rare somatic mutations are of interest. However, in Mendelian studies, the germline mutational landscape will be of more interest than somatic mutations. Before a final VCF file is produced for a given sample, software can be used to predict if the variant will be functionally damaging to the protein for prioritizing candidate genes for further study.


推荐期刊 论文范文 学术会议资讯 论文写作 发表流程 期刊征稿 常见问题 网站通告
论文快速发表网(www.k-fabiao.com)版权所有,专业学术期刊论文发表网站
代理杂志社征稿、杂志投稿、省级期刊、国家级期刊、SCI/EI期刊、学术论文发表,中国学术期刊网全文收录