Logo

westStrPhaser Constructs Tandem Repeat Alleles from InDels and SNPs in VCF data

Jun 6, 2024 14:06 PM - Jun 6, 2024 15:06 PM, Xuewen Wang, Biological Sciences, Section Presentation

Logo

Variant calling is a ubiquitous genomic technique that underpins a great many scientific disciplines. From a computational perspective, variant calling is a form of logical compression; neglecting large variation, a whole genome can be losslessly described as a set of differences (SNP and small InDel alleles) relative to the reference sequence. Another common genomic technique is haplotype phasing, wherein alleles are partitioned into their paternal and maternal components (as haplotypes). Some classes of alleles are more difficult to describe than others. For example, short tandem repeats (STRs) are highly repetitive and are prone to length-based polymorphism. Due primarily to their elevated rate of mutation, STRs are also a critical marker for many genetic assays. However, STRs tend not to be (explicitly) reported in most genomic workflows. Here, we introduce a novel algorithm designed to construct STR alleles from SNPs and InDels in phased VCF datasets. This algorithm has been implemented in StrPhaser. We have demonstrated its application for ~10 thousand STR alleles from 284 humans with cross-validation data, focusing on core forensic STR sites where an average accuracy of 91% was obtained. StrPhaser is user-friendly, operates fast on different computing platforms, and also facilitates a colorful visualization of TR alleles. The phased TR alleles enable direct links to other phased variants, enriching genetic information. StrPhaser accelerates the analysis and comparison of TR alleles.
Availability
The StrPhaser is publicly available at https://github.com/XuewenWangUGA/StrPhaser.

a Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA
b Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA
*Corresponding author at: Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, USA. E-mail address: august.woerner@unthsc.edu (A.E. Woerner).