Anwica Kashfeen
University of North Carolina at Chapel Hill, NC, USA
Frontier: Finding the Boundaries of Novel Transposable Element Insertions in Genomes
We developed a novel template-free approach for finding Transposable Element insertions in a given genome. We used TEs repetitive nature to identify all the repeats (i.e. potential TEs) from short reads. Then from these repeats, we distinguished only the TEi boundaries which contains partial TE and nonTE segment using a deep learning classifier. We also classified all the TE types present in each TEi boundary using another classifier. Using our method, we were able to find a large fraction of TEis that are not annotated in the reference genome.
ELITE: Efficiently Locating Insertions of Transposable Element
We developed a tool called ELITE to identify and characterize TE insertions in a given genome. We use an msBWT-based data structure to store and index all the reads from a high-throughput sequencing dataset and leverage a sampled FM-index to efficiently search for TEs and their nearby genomic contexts. ELITE also finds unannotated TEs that are distantly related to the target TE. Given a population of closely related individuals, ELITE also detects TEi that are polymorphic within subpopulations and those due to recent activity.
MiniMUGA: Mini Mouse Universal Genotype Array
We designed 10k SNP markers for assessing genomic variation in commarcially available inbred mouse strain. MiniMUGA contains 4,902 maximally informative SNPs distributed across the autosomes and X chromosome, 89 SNPs on the mitochondria, and 77 SNPs on chromosome Y. Even though MiniMUGA has a lot less markers than other genotyping arrays, it's still able to discriminate between various strains and their substrains effectively. This is a very low cost array and commercially available thourgh Neogen
Mitochondrial genome assembly of 76 mice in Collaborative Cross
We developed a new approach for the genomic assembly using msBWT of short reads data obtained from current sequencing technology like Illumina. At first, we divide the sequence of reference genome into several non-overlapping 45-mer. For each 45-mer we query the msBWT to find the count of supporting reads. These counts are used to anchor local genomic assemblies, which attempt to resolve the subsequence of “near-zero” queries. Using this technique, we assembled 76 mitochondrial mouse genomes in CC. Our assembly is verified through the known lineage of CC as mitochondria is directly inherited from the mother.