trtools.dumpSTR.filters module

Locus-level and Call-level VCF filters

class trtools.dumpSTR.filters.CallFilterMaxValue(name, field, threshold)

Bases: trtools.dumpSTR.filters.Reason

Generic call-level filter based on maximum allowed value for a field.

Extends Reason class. For any call-level value, such as DP, this class can be used to make a filter based on the maximum allowed value for that field.

Parameters
  • name (str) – The name of the filter to put in the FORMAT:FILTER field of filtered calls.

  • field (str) – The FORMAT field to filter on

  • threshold (float) – The maximum allowed value for the field.

name

The name of the filter to put in the FORMAT:FILTER field of filtered calls.

Type

str

field

The FORMAT field to filter on

Type

str

threshold

The maximum allowed value for the field.

Type

float

Examples

>>> max_dp_filt = CallFilterMaxValue("HIGHDP","DP",1000)
class trtools.dumpSTR.filters.CallFilterMinValue(name, field, threshold)

Bases: trtools.dumpSTR.filters.Reason

Generic call-level filter based on minimum allowed value for a field.

Extends Reason class. For any call-level value, such as DP, this class can be used to make a filter based on the minimum allowed value for that field.

Parameters
  • name (str) – The name of the filter to put in the FORMAT:FILTER field of filtered calls.

  • field (str) – The FORMAT field to filter on

  • threshold (float) – The minimum allowed value for the field.

name

The name of the filter to put in the FORMAT:FILTER field of filtered calls.

Type

str

field

The FORMAT field to filter on

Type

str

threshold

The minimum allowed value for the field.

Type

float

Examples

>>> min_dp_filt = CallFilterMinValue("LOWDP","DP",10)
class trtools.dumpSTR.filters.FilterBase

Bases: object

Base class for locus level filters. Just defines the interface

description()
filter_name()
name = 'NotYetImplemented'
class trtools.dumpSTR.filters.Filter_LocusHrun

Bases: trtools.dumpSTR.filters.FilterBase

Class to filter VCF records for penta- or hexanucleotide STRs with long homopolymer runs

This only works on HipSTR or LongTR VCFs. STRs with long homopolymer runs have been shown to be difficult for HipSTR to call and may be challenging with LongTR. This filter removes 5-mers with homopolymer runs >= len 5 and 6-mers with homopolymer runs >= len 6

filter_name()
name = 'HRUN'

The name of the filter

class trtools.dumpSTR.filters.Filter_MaxLocusHet(max_locus_het, uselength=False)

Bases: trtools.dumpSTR.filters.FilterBase

Class to filter VCF records by maximum heterozygosity

This class extends Base

Parameters
  • max_locus_het (float) – Filters calls with heterozygosity greater than this

  • vcftype (trh.VCFTYPES) – the type of the VCF we’re working with

  • uselength (bool, optional) – If set to true, consider all alleles with the same length as the same

threshold

Filters calls with heterozygosity greater than this

Type

float

vcftype

the type of the VCF we’re working with

Type

trh.VCFTYPES

uselength

If set to true, consider all alleles with the same length as the same

Type

bool, optional

filter_name()
name = 'HETHIGH'

The name of the filter

class trtools.dumpSTR.filters.Filter_MinLocusCallrate(min_locus_callrate)

Bases: trtools.dumpSTR.filters.FilterBase

Class to filter VCF records by call rate

This class extends Base

Parameters

min_locus_callrate (float) – Filters calls with lower than this fraction called

threshold

Filters calls with lower than this fraction called Derived from the input min_locus_callrate

Type

float

filter_name()
name = 'CALLRATE'

The name of the filter

class trtools.dumpSTR.filters.Filter_MinLocusHWEP(min_locus_hwep, uselength=False)

Bases: trtools.dumpSTR.filters.FilterBase

Class to filter VCF records by minimum HWE p-value

This class extends Base

Parameters
  • min_locus_hwep (float) – Filters calls with HWE p-value lower than this

  • vcftype (trh.VCFTYPES) – the type of the VCF we’re working with

  • uselength (bool, optional) – If set to true, consider all alleles with the same length as the same

threshold

Filters calls with HWE p-value lower than this

Type

float

vcftype

the type of the VCF we’re working with

Type

trh.VCFTYPES

uselength

If set to true, consider all alleles with the same length as the same

Type

bool, optional

filter_name()
name = 'HWE'

The name of the filter

class trtools.dumpSTR.filters.Filter_MinLocusHet(min_locus_het, uselength=False)

Bases: trtools.dumpSTR.filters.FilterBase

Class to filter VCF records by minimum heterozygosity

This class extends Base

Parameters
  • min_locus_het (float) – Filters calls with heterozygosity lower than this

  • vcftype (trh.VCFTYPES) – the type of the VCF we’re working with

  • uselength (bool, optional) – If set to true, consider all alleles with the same length as the same

threshold

Filters calls with heterozygosity lower than this

Type

float

vcftype

the type of the VCF we’re working with

Type

trh.VCFTYPES

uselength

If set to true, consider all alleles with the same length as the same

Type

bool, optional

filter_name()
name = 'HETLOW'

The name of the filter

class trtools.dumpSTR.filters.GangSTRCallBadCI

Bases: trtools.dumpSTR.filters.Reason

Filter GangSTR calls where the ML genotype estimate is outside of CI.

If 95% confidence interval does not include the maximum likelihood genotype call, the call is filtered.

Extends Reason class. Based on REPCI and REPCN fields.

name = 'GangSTRCallBadCI'

The name of the filter

class trtools.dumpSTR.filters.GangSTRCallExpansionProbHet(threshold)

Bases: trtools.dumpSTR.filters.Reason

Filter GangSTR calls with low probability for heterozygous expansion.

Extends Reason class. Based on the QEXP field. Filter if QEXP[:, 1] (prob het expansion above INFO:THRESHOLD) is less than the threshold.

Parameters

threshold (float) – Minimum heterozygous expansion probability

threshold

Minimum heterozygous expansion probability

Type

float

name = 'GangSTRCallExpansionProbHet'

The name of the filter

class trtools.dumpSTR.filters.GangSTRCallExpansionProbHom(threshold)

Bases: trtools.dumpSTR.filters.Reason

Filter GangSTR calls with low probability for homozygous expansion.

Extends Reason class. Based on the QEXP field. Filter if QEXP[:, 2] (prob hom expansion above INFO:THRESHOLD) is less than the threshold.

Parameters

threshold (float) – Minimum homozygous expansion probability

threshold

Minimum homozygous expansion probability

Type

float

name = 'GangSTRCallExpansionProbHom'

The name of the filter

class trtools.dumpSTR.filters.GangSTRCallExpansionProbTotal(threshold)

Bases: trtools.dumpSTR.filters.Reason

Filter GangSTR calls with low probability for expansion (heterozygous or homozygous)

Extends Reason class. Based on the QEXP field. Filter if QEXP[1]+QEXP[2] (prob het or hom expansion above INFO:THRESHOLD) is less than the threshold.

Parameters

threshold (float) – Minimum expansion probability

threshold

Minimum expansion probability

Type

float

name = 'GangSTRCallExpansionProbTotal'

The name of the filter

class trtools.dumpSTR.filters.GangSTRCallSpanBoundOnly

Bases: trtools.dumpSTR.filters.Reason

Filter GangSTR calls where only spanning or flanking reads were identified.

Extends Reason class. Based on RC field.

name = 'GangSTRCallSpanBoundOnly'

The name of the filter

class trtools.dumpSTR.filters.GangSTRCallSpanOnly

Bases: trtools.dumpSTR.filters.Reason

Filter GangSTR calls where only spanning reads were identified.

Extends Reason class. Based on RC field.

name = 'GangSTRCallSpanOnly'

FILTER field of filtered calls.

Type

The name of the filter to put in the FORMAT

class trtools.dumpSTR.filters.HipSTRCallFlankIndels(threshold, rename=None)

Bases: trtools.dumpSTR.filters.Reason

Filter HipSTR calls with many indels in flanks This filter may also be used on LongTR which has similar fields.

Extends Reason class. Filters on the percentage of reads with indels in flanks. Based on FORMAT:DFLANKINDEL and FORMAT:DP fields.

Parameters
  • threshold (float) – Minimum percent of reads that can have indels in their flanks

  • rename (str (optional)) – Use a different name for this filter

threshold

Minimum percent of reads that can have indels in their flanks

Type

float

name = 'HipSTRCallFlankIndels'

The name of the filter

class trtools.dumpSTR.filters.HipSTRCallMinSuppReads(threshold, rename=None)

Bases: trtools.dumpSTR.filters.Reason

Filter HipSTR calls for which alleles are supported by too few reads This filter may also be used on LongTR which has similar fields.

Extends Reason class. Filters on the number of reads supporting each called allele. Based on FORMAT:ALLREADS and FORMAT:GB fields. Assumes that number of supporting reads is zero if: * that read length is not present in ALLREADS * or ALLREADS is unset for that sample (‘.’) * or ALLREADS is not present at that locus

Parameters
  • threshold (int) – Minimum number of reads supporting each allele

  • rename (str (optional)) – Use a different name for this filter

threshold

Minimum number of reads supporting each allele

Type

int

name = 'HipSTRMinSuppReads'

The name of the filter

class trtools.dumpSTR.filters.HipSTRCallStutter(threshold, rename=None)

Bases: trtools.dumpSTR.filters.Reason

Filter HipSTR calls with many stutter reads

Extends Reason class. Filters on the percentage of reads with stutter errors Based on FORMAT:DSTUTTER and FORMAT:DP fields.

Parameters
  • threshold (float) – Minimum percent of reads that can have stutter errors

  • rename (str (optional)) – Use a different name for this filter

threshold

Minimum percent of reads that can have stutter errors

Type

float

name = 'HipSTRCallStutter'

The name of the filter

class trtools.dumpSTR.filters.PopSTRCallRequireSupport(threshold)

Bases: trtools.dumpSTR.filters.Reason

Filter PopSTR calls not supported by enough reads

Extends Reason class. Relies on FORMAT:AD field.

Parameters

threshold (int) – Require this many reads supporting each called allele.

threshold

Require this many reads supporting each called allele.

Type

int

name = 'PopSTRCallRequireSupport'

The name of the filter

class trtools.dumpSTR.filters.Reason

Bases: object

Base call-level filter class.

Other classes extend this for each different call-level filter. Classes that extend this must implement a __call__ function that gets applied to each call. The __call__ function returns a 1D array of values, one per sample. For numeric arrays, nan values indicate the sample wasn’t filtered, and any other value indicates that the sample was filtered (where the value indicates why).

GetReason()
name = ''

FILTER field of filtered calls.

Type

The name of the filter to put in the FORMAT

trtools.dumpSTR.filters.create_region_filter(name, filename)

Creates a locus-level filter based on a file of regions.

Builds and returns a class extending Base that can be used to filter any records overlapping intervals in the input BED file

Parameters
  • name (str) – Name of the region filter to create. This will go in the FILTER field of the output VCF

  • filename (str) – BED file containing the regions. Must be sorted by chrom, start If it’s not bgzipped and tabixed, we’ll attempt to do that.

  • Returns

  • filter_regions (Base object.) – Returns None if we fail to load the regions

  • -------