Code Examples

In addition to command-line utilities, TRTools has a Python library which can be imported in custom scripts.

TR Utilities

The module trtools.utils.utils contains various helpful functions, e.g. for inferring a repeat sequence motif from a string, converting repeat motifs to canonical representations, or computing functions like heterozygosity from allele frequency distributions:

import trtools.utils.utils as trutils
trutils.InferRepeatSequence('ATATATATATATA', 2) # returns 'AT'
trutils.GetCanonicalMotif('CAG') # returns 'ACG'
afreqs = {10:0.25, 11:0.5, 12: 0.25} # frequency of each length
trutils.GetHeterozygosity(afreqs) # returns 0.625

See here for a complete list of utility functions.

TR Harmonization

Note: the interfaces provided by this library are still under active development. While their rough shape will likely stay the same, particular details are likely to change in future releases.

The module trtools.utils.tr_harmonizer module is responsible for providing a genotyper agnostic view of a VCF containing TR records. The class that provides this functionality is trtools.utils.tr_harmonizer.TRRecord. There are two coding paradigms for accessing this API. If you just want to iterate through the TRRecords in a vcf, use the TRRecordHarmonizer:

import cyvcf2
import trtools.utils.tr_harmonizer as trh

invcf = cyvcf2.VCF("path/to/my.vcf")
harmonizer = trh.TRRecordHarmonizer(invcf)
for trrecord in harmonizer:
      # do something with the trrecord
      # here, we print out an array of length genotypes
      # containing entries for each haplotype of each sample
      print(trrecord.GetLengthGenotypes())

If you want to work with the cyvcf2 VCF object yourself and then later convert it to a TRRecord, use the module method HarmonizeRecord:

import cyvcf2
import trtools.utils.tr_harmonizer as trh

invcf = cyvcf2.VCF("path/to/my.vcf")
#make sure to grab the vcf's filetype
vcftype = trh.InferVCFType(invcf)
for record in invcf:
  # do some filtering on the record
  passesFilters = filter(record)

  # later on, convert the record to a trrecord
  if not passesFilters:
     continue

  trrecord = trh.HarmonizeRecord(vcftype, record)
  # do something with the tr record
  # here, we print out an array of string genotypes
  # containing entries for each haplotype of each sample
  print(trrecord.GetStringGenotypes())

See trtools.utils.tr_harmonizer for a complete list of all functions used for inspecting VCFs and TRRecords.