trtools.annotaTR module
- trtools.annotaTR.CheckAlleleCompatibility(record_ref, record_alt, panel_ref, panel_alt)
Check if the REF and ALT alleles of the record and reference panel are compatible.
In the case of running imputation with Beagle followed by bcftools merge, allele order is maintained but sequences themselves may be trimmed by bcftools. This causes problems when harmonizing HipSTR records, since the START/END coords are not updated accordingly. Using annotaTR option –update-ref-alt can restore the original allele sequences from the refpanel. This function provides basic checks to make sure the ref/alts of the panel and target VCF are compatible. In particular: - is the number of ALT alleles the same - are all alleles offset by the same number of bp - are all the ALTs in the target VCF substrings of the refpanel ALTS. If any of these fail then the alleles are deemed incompatible.
- Parameters
record_ref (str) – REF allele of the target VCF
record_alt (list of str) – ALT alleles of the target VCF
panel_ref (str) – REF allele of the ref panel
panel_alt (list of str) – ALT alleles of the ref panel
- Returns
is_compatible – True if all checks pass, otherwise False
- Return type
bool
- trtools.annotaTR.GetLocusKey(record, match_on=RefMatchTypes.locid)
Get the key used to match refpanel loci to the target VCF
Options to match on:
RefMatchTypes.locid: use the ID from the VCF file
- RefMatchTypes.rawalleles: use chrom:pos:ref:alt where ref/alt are
exactly those in the reference VCF
- RefMatchTypes.trimmedalleles: use chrom:pos:ref:alt where ref/alt are
trimmed to discard extra sequence, as is done in bcftools merge :( see: https://github.com/samtools/bcftools/issues/726
- Parameters
record (cyvcf2.Variant) – Record to get the locus key for
match_on (RefMatchTypes) – way to generate the key (Default: locid)
- Returns
locuskey – String of the key
- Return type
str
- trtools.annotaTR.GetPGenPvarWriter(reader, outprefix, variant_ct)
Generate a PGEN and corresponding PVAR writer. For PGEN, we return a pgenlib.PgenWriter instance For PVAR, we create a file object to which we will write info for each variant as we go. When initialized here we add the DSLEN INFO header and also the header columns: #CHROM”, “POS”, “ID”, “REF”, “ALT”, “INFO”
In addition to the PGEN/PVAR writers, this function writes $outprefix.psam wtih sample information.
- Parameters
reader (cyvcf2.VCF) – Reader for the input VCF file
outprefix (str) – Prefix to name output files Will generate $outprefix.pgen and $outprefix.pvar
variant_ct (int) – Number of variants to be written to the PGEN output
- Returns
pgen_writer (pgenlib.PgenWriter) – PGEN writer object
pvar_writer (file object) – Writer for the PVAR file
- trtools.annotaTR.LoadMetadataFromRefPanel(refreader, vcftype, match_on=RefMatchTypes.locid, ignore_duplicates=False)
Load required INFO fields from the ref panel we will use to annotate the target VCF. The specific INFO fields loaded depends on the vcftype and are specified in INFOFIELDS
The value of match_on determines what to use as the key in the returned dictionary. Options:
RefMatchTypes.locid: use the ID from the VCF file
- RefMatchTypes.rawalleles: use chrom:pos:ref:alt where ref/alt are
exactly those in the reference VCF
- RefMatchTypes.trimmedalleles: use chrom:pos:ref:alt where ref/alt are
trimmed to discard extra sequence, as is done in bcftools merge :( see: https://github.com/samtools/bcftools/issues/726
- Parameters
refreader (cyvcf2.VCF) – Reader for the reference panel
vcftype (trh.VcfTypes) – Based on the TR genotyper used to generate the reference panel
match_on (RefMatchTypes (Optional)) – What to use as the locus key to match target to ref panel loci Default: RefMatchTypes.id
ignore_duplicates (bool) – If True, just output a warning about duplicates rather than giving up
- Returns
metadata (Dict[str, str]) – The key depends on the match_on parameter (see above) Values is a Dict[str, str] with key=infofield and value=value of that info field in the reference panel Also includes REF/ALT to check against alleles in imputed VCF
variant_ct (int) – Total number of variants
- Raises
ValueError – If a duplicate locus is found in the reference panel and ignore_duplicates=False
- class trtools.annotaTR.OutputFileTypes(value)
Bases:
enum.Enum
Different supported output file types.
- pgen = 'pgen'
- vcf = 'vcf'
- class trtools.annotaTR.PgenWriter
Bases:
object
- append_alleles()
- append_alleles_batch()
- append_biallelic()
- append_biallelic_batch()
- append_dosages()
- append_dosages_batch()
- append_partially_phased()
- append_partially_phased_batch()
- close()
- class trtools.annotaTR.RefMatchTypes(value)
Bases:
enum.Enum
Different supported output file types.
- locid = 'locid'
- rawalleles = 'rawalleles'
- trimmedalleles = 'trimmedalleles'
- trtools.annotaTR.TrimAlleles(ref_allele, alt_alleles)
Trim ref and alt alleles to remove common prefixes and suffixes present in all alleles
- Parameters
ref_allele (str) – Reference allele
alt_allele (list of str) – List of alternate alleles
- Returns
new_ref_allele (str) – Trimmed reference allele
new_alt_allele (list of str) – List of trimmed alternate alleles
- trtools.annotaTR.UpdateVCFHeader(reader, command, vcftype, dosage_type=None, refreader=None)
Update the VCF header of the reader to include: - The annotatTR command used - new INFO and FORMAT fields we will add if annotating dosages (INFO/DSLEN and FORMAT/TRDS) - new INFO fields we will add if using refpanel annotation. The fields added depend on the vcftype and are listed in INFOFIELDS
Note this function gets called even if we are not producing VCF output since in some cases we still might need to add these fields to the record during processing, and cyvcf2 will throw an error if the proper headers are not there
- Parameters
reader (cyvcf2.VCF) – Reader for the input VCF file
command (str) – The annotaTR command used
vcftype (trh.VcfTypes) – Which type of TR VCF file the reader/refreader are
dosage_type (trh.TRDosageType) – The type of dosages to be annotated. None if not computing dosages
refreader (cyvcf2.VCF) – Reader for the reference panel
- Returns
success – True if adding header fields was successful, otherwise False
- Return type
bool
- trtools.annotaTR.WritePvarVariant(pvar_writer, record, minlen, maxlen)
Write variant metadata to a PVAR file Outputs CHROM, POS, ID, REF, ALT, INFO REF and ALT are set to dummy values (DUMMY_REF and DUMMY_ALT) INFO contains the DSLEN field with the min/max allele lengths observed
- Parameters
pvar_writer (file object) – Writer for the PVAR file
record (cyvcf2.Variant) – Record object for the variant
minlen (float) – Minimum TR allele length at this locus (in rpt. units)
maxlen (float) – Maximum TR allele length at this locus (in rpt. units)
- trtools.annotaTR.getargs()
- trtools.annotaTR.main(args)
- trtools.annotaTR.run()