trtools.annotaTR module

trtools.annotaTR.CheckAlleleCompatibility(record_ref, record_alt, panel_ref, panel_alt)

Check if the REF and ALT alleles of the record and reference panel are compatible.

In the case of running imputation with Beagle followed by bcftools merge, allele order is maintained but sequences themselves may be trimmed by bcftools. This causes problems when harmonizing HipSTR records, since the START/END coords are not updated accordingly. Using annotaTR option –update-ref-alt can restore the original allele sequences from the refpanel. This function provides basic checks to make sure the ref/alts of the panel and target VCF are compatible. In particular: - is the number of ALT alleles the same - are all alleles offset by the same number of bp - are all the ALTs in the target VCF substrings of the refpanel ALTS. If any of these fail then the alleles are deemed incompatible.

Parameters
  • record_ref (str) – REF allele of the target VCF

  • record_alt (list of str) – ALT alleles of the target VCF

  • panel_ref (str) – REF allele of the ref panel

  • panel_alt (list of str) – ALT alleles of the ref panel

Returns

is_compatible – True if all checks pass, otherwise False

Return type

bool

trtools.annotaTR.GetLocusKey(record, match_on=RefMatchTypes.locid)

Get the key used to match refpanel loci to the target VCF

Options to match on:

  • RefMatchTypes.locid: use the ID from the VCF file

  • RefMatchTypes.rawalleles: use chrom:pos:ref:alt where ref/alt are

    exactly those in the reference VCF

  • RefMatchTypes.trimmedalleles: use chrom:pos:ref:alt where ref/alt are

    trimmed to discard extra sequence, as is done in bcftools merge :( see: https://github.com/samtools/bcftools/issues/726

Parameters
  • record (cyvcf2.Variant) – Record to get the locus key for

  • match_on (RefMatchTypes) – way to generate the key (Default: locid)

Returns

locuskey – String of the key

Return type

str

trtools.annotaTR.GetPGenPvarWriter(reader, outprefix, variant_ct)

Generate a PGEN and corresponding PVAR writer. For PGEN, we return a pgenlib.PgenWriter instance For PVAR, we create a file object to which we will write info for each variant as we go. When initialized here we add the DSLEN INFO header and also the header columns: #CHROM”, “POS”, “ID”, “REF”, “ALT”, “INFO”

In addition to the PGEN/PVAR writers, this function writes $outprefix.psam wtih sample information.

Parameters
  • reader (cyvcf2.VCF) – Reader for the input VCF file

  • outprefix (str) – Prefix to name output files Will generate $outprefix.pgen and $outprefix.pvar

  • variant_ct (int) – Number of variants to be written to the PGEN output

Returns

  • pgen_writer (pgenlib.PgenWriter) – PGEN writer object

  • pvar_writer (file object) – Writer for the PVAR file

trtools.annotaTR.LoadMetadataFromRefPanel(refreader, vcftype, match_on=RefMatchTypes.locid, ignore_duplicates=False)

Load required INFO fields from the ref panel we will use to annotate the target VCF. The specific INFO fields loaded depends on the vcftype and are specified in INFOFIELDS

The value of match_on determines what to use as the key in the returned dictionary. Options:

  • RefMatchTypes.locid: use the ID from the VCF file

  • RefMatchTypes.rawalleles: use chrom:pos:ref:alt where ref/alt are

    exactly those in the reference VCF

  • RefMatchTypes.trimmedalleles: use chrom:pos:ref:alt where ref/alt are

    trimmed to discard extra sequence, as is done in bcftools merge :( see: https://github.com/samtools/bcftools/issues/726

Parameters
  • refreader (cyvcf2.VCF) – Reader for the reference panel

  • vcftype (trh.VcfTypes) – Based on the TR genotyper used to generate the reference panel

  • match_on (RefMatchTypes (Optional)) – What to use as the locus key to match target to ref panel loci Default: RefMatchTypes.id

  • ignore_duplicates (bool) – If True, just output a warning about duplicates rather than giving up

Returns

  • metadata (Dict[str, str]) – The key depends on the match_on parameter (see above) Values is a Dict[str, str] with key=infofield and value=value of that info field in the reference panel Also includes REF/ALT to check against alleles in imputed VCF

  • variant_ct (int) – Total number of variants

Raises

ValueError – If a duplicate locus is found in the reference panel and ignore_duplicates=False

class trtools.annotaTR.OutputFileTypes(value)

Bases: enum.Enum

Different supported output file types.

pgen = 'pgen'
vcf = 'vcf'
class trtools.annotaTR.PgenWriter

Bases: object

append_alleles()
append_alleles_batch()
append_biallelic()
append_biallelic_batch()
append_dosages()
append_dosages_batch()
append_partially_phased()
append_partially_phased_batch()
close()
class trtools.annotaTR.RefMatchTypes(value)

Bases: enum.Enum

Different supported output file types.

locid = 'locid'
rawalleles = 'rawalleles'
trimmedalleles = 'trimmedalleles'
trtools.annotaTR.TrimAlleles(ref_allele, alt_alleles)

Trim ref and alt alleles to remove common prefixes and suffixes present in all alleles

Parameters
  • ref_allele (str) – Reference allele

  • alt_allele (list of str) – List of alternate alleles

Returns

  • new_ref_allele (str) – Trimmed reference allele

  • new_alt_allele (list of str) – List of trimmed alternate alleles

trtools.annotaTR.UpdateVCFHeader(reader, command, vcftype, dosage_type=None, refreader=None)

Update the VCF header of the reader to include: - The annotatTR command used - new INFO and FORMAT fields we will add if annotating dosages (INFO/DSLEN and FORMAT/TRDS) - new INFO fields we will add if using refpanel annotation. The fields added depend on the vcftype and are listed in INFOFIELDS

Note this function gets called even if we are not producing VCF output since in some cases we still might need to add these fields to the record during processing, and cyvcf2 will throw an error if the proper headers are not there

Parameters
  • reader (cyvcf2.VCF) – Reader for the input VCF file

  • command (str) – The annotaTR command used

  • vcftype (trh.VcfTypes) – Which type of TR VCF file the reader/refreader are

  • dosage_type (trh.TRDosageType) – The type of dosages to be annotated. None if not computing dosages

  • refreader (cyvcf2.VCF) – Reader for the reference panel

Returns

success – True if adding header fields was successful, otherwise False

Return type

bool

trtools.annotaTR.WritePvarVariant(pvar_writer, record, minlen, maxlen)

Write variant metadata to a PVAR file Outputs CHROM, POS, ID, REF, ALT, INFO REF and ALT are set to dummy values (DUMMY_REF and DUMMY_ALT) INFO contains the DSLEN field with the min/max allele lengths observed

Parameters
  • pvar_writer (file object) – Writer for the PVAR file

  • record (cyvcf2.Variant) – Record object for the variant

  • minlen (float) – Minimum TR allele length at this locus (in rpt. units)

  • maxlen (float) – Maximum TR allele length at this locus (in rpt. units)

trtools.annotaTR.getargs()
trtools.annotaTR.main(args)
trtools.annotaTR.run()