trtools.dumpSTR.filters module
Locus-level and Call-level VCF filters
- class trtools.dumpSTR.filters.CallFilterMaxValue(name, field, threshold)
Bases:
trtools.dumpSTR.filters.Reason
Generic call-level filter based on maximum allowed value for a field.
Extends Reason class. For any call-level value, such as DP, this class can be used to make a filter based on the maximum allowed value for that field.
- Parameters
name (str) – The name of the filter to put in the FORMAT:FILTER field of filtered calls.
field (str) – The FORMAT field to filter on
threshold (float) – The maximum allowed value for the field.
- name
The name of the filter to put in the FORMAT:FILTER field of filtered calls.
- Type
str
- field
The FORMAT field to filter on
- Type
str
- threshold
The maximum allowed value for the field.
- Type
float
Examples
>>> max_dp_filt = CallFilterMaxValue("HIGHDP","DP",1000)
- class trtools.dumpSTR.filters.CallFilterMinValue(name, field, threshold)
Bases:
trtools.dumpSTR.filters.Reason
Generic call-level filter based on minimum allowed value for a field.
Extends Reason class. For any call-level value, such as DP, this class can be used to make a filter based on the minimum allowed value for that field.
- Parameters
name (str) – The name of the filter to put in the FORMAT:FILTER field of filtered calls.
field (str) – The FORMAT field to filter on
threshold (float) – The minimum allowed value for the field.
- name
The name of the filter to put in the FORMAT:FILTER field of filtered calls.
- Type
str
- field
The FORMAT field to filter on
- Type
str
- threshold
The minimum allowed value for the field.
- Type
float
Examples
>>> min_dp_filt = CallFilterMinValue("LOWDP","DP",10)
- class trtools.dumpSTR.filters.FilterBase
Bases:
object
Base class for locus level filters. Just defines the interface
- description()
- filter_name()
- name = 'NotYetImplemented'
- class trtools.dumpSTR.filters.Filter_LocusHrun
Bases:
trtools.dumpSTR.filters.FilterBase
Class to filter VCF records for penta- or hexanucleotide STRs with long homopolymer runs
This only works on HipSTR or LongTR VCFs. STRs with long homopolymer runs have been shown to be difficult for HipSTR to call and may be challenging with LongTR. This filter removes 5-mers with homopolymer runs >= len 5 and 6-mers with homopolymer runs >= len 6
- filter_name()
- name = 'HRUN'
The name of the filter
- class trtools.dumpSTR.filters.Filter_MaxLocusHet(max_locus_het, uselength=False)
Bases:
trtools.dumpSTR.filters.FilterBase
Class to filter VCF records by maximum heterozygosity
This class extends Base
- Parameters
max_locus_het (float) – Filters calls with heterozygosity greater than this
vcftype (trh.VCFTYPES) – the type of the VCF we’re working with
uselength (bool, optional) – If set to true, consider all alleles with the same length as the same
- threshold
Filters calls with heterozygosity greater than this
- Type
float
- vcftype
the type of the VCF we’re working with
- Type
trh.VCFTYPES
- uselength
If set to true, consider all alleles with the same length as the same
- Type
bool, optional
- filter_name()
- name = 'HETHIGH'
The name of the filter
- class trtools.dumpSTR.filters.Filter_MinLocusCallrate(min_locus_callrate)
Bases:
trtools.dumpSTR.filters.FilterBase
Class to filter VCF records by call rate
This class extends Base
- Parameters
min_locus_callrate (float) – Filters calls with lower than this fraction called
- threshold
Filters calls with lower than this fraction called Derived from the input min_locus_callrate
- Type
float
- filter_name()
- name = 'CALLRATE'
The name of the filter
- class trtools.dumpSTR.filters.Filter_MinLocusHWEP(min_locus_hwep, uselength=False)
Bases:
trtools.dumpSTR.filters.FilterBase
Class to filter VCF records by minimum HWE p-value
This class extends Base
- Parameters
min_locus_hwep (float) – Filters calls with HWE p-value lower than this
vcftype (trh.VCFTYPES) – the type of the VCF we’re working with
uselength (bool, optional) – If set to true, consider all alleles with the same length as the same
- threshold
Filters calls with HWE p-value lower than this
- Type
float
- vcftype
the type of the VCF we’re working with
- Type
trh.VCFTYPES
- uselength
If set to true, consider all alleles with the same length as the same
- Type
bool, optional
- filter_name()
- name = 'HWE'
The name of the filter
- class trtools.dumpSTR.filters.Filter_MinLocusHet(min_locus_het, uselength=False)
Bases:
trtools.dumpSTR.filters.FilterBase
Class to filter VCF records by minimum heterozygosity
This class extends Base
- Parameters
min_locus_het (float) – Filters calls with heterozygosity lower than this
vcftype (trh.VCFTYPES) – the type of the VCF we’re working with
uselength (bool, optional) – If set to true, consider all alleles with the same length as the same
- threshold
Filters calls with heterozygosity lower than this
- Type
float
- vcftype
the type of the VCF we’re working with
- Type
trh.VCFTYPES
- uselength
If set to true, consider all alleles with the same length as the same
- Type
bool, optional
- filter_name()
- name = 'HETLOW'
The name of the filter
- class trtools.dumpSTR.filters.GangSTRCallBadCI
Bases:
trtools.dumpSTR.filters.Reason
Filter GangSTR calls where the ML genotype estimate is outside of CI.
If 95% confidence interval does not include the maximum likelihood genotype call, the call is filtered.
Extends Reason class. Based on REPCI and REPCN fields.
- name = 'GangSTRCallBadCI'
The name of the filter
- class trtools.dumpSTR.filters.GangSTRCallExpansionProbHet(threshold)
Bases:
trtools.dumpSTR.filters.Reason
Filter GangSTR calls with low probability for heterozygous expansion.
Extends Reason class. Based on the QEXP field. Filter if QEXP[:, 1] (prob het expansion above INFO:THRESHOLD) is less than the threshold.
- Parameters
threshold (float) – Minimum heterozygous expansion probability
- threshold
Minimum heterozygous expansion probability
- Type
float
- name = 'GangSTRCallExpansionProbHet'
The name of the filter
- class trtools.dumpSTR.filters.GangSTRCallExpansionProbHom(threshold)
Bases:
trtools.dumpSTR.filters.Reason
Filter GangSTR calls with low probability for homozygous expansion.
Extends Reason class. Based on the QEXP field. Filter if QEXP[:, 2] (prob hom expansion above INFO:THRESHOLD) is less than the threshold.
- Parameters
threshold (float) – Minimum homozygous expansion probability
- threshold
Minimum homozygous expansion probability
- Type
float
- name = 'GangSTRCallExpansionProbHom'
The name of the filter
- class trtools.dumpSTR.filters.GangSTRCallExpansionProbTotal(threshold)
Bases:
trtools.dumpSTR.filters.Reason
Filter GangSTR calls with low probability for expansion (heterozygous or homozygous)
Extends Reason class. Based on the QEXP field. Filter if QEXP[1]+QEXP[2] (prob het or hom expansion above INFO:THRESHOLD) is less than the threshold.
- Parameters
threshold (float) – Minimum expansion probability
- threshold
Minimum expansion probability
- Type
float
- name = 'GangSTRCallExpansionProbTotal'
The name of the filter
- class trtools.dumpSTR.filters.GangSTRCallSpanBoundOnly
Bases:
trtools.dumpSTR.filters.Reason
Filter GangSTR calls where only spanning or flanking reads were identified.
Extends Reason class. Based on RC field.
- name = 'GangSTRCallSpanBoundOnly'
The name of the filter
- class trtools.dumpSTR.filters.GangSTRCallSpanOnly
Bases:
trtools.dumpSTR.filters.Reason
Filter GangSTR calls where only spanning reads were identified.
Extends Reason class. Based on RC field.
- name = 'GangSTRCallSpanOnly'
FILTER field of filtered calls.
- Type
The name of the filter to put in the FORMAT
- class trtools.dumpSTR.filters.HipSTRCallFlankIndels(threshold, rename=None)
Bases:
trtools.dumpSTR.filters.Reason
Filter HipSTR calls with many indels in flanks This filter may also be used on LongTR which has similar fields.
Extends Reason class. Filters on the percentage of reads with indels in flanks. Based on FORMAT:DFLANKINDEL and FORMAT:DP fields.
- Parameters
threshold (float) – Minimum percent of reads that can have indels in their flanks
rename (str (optional)) – Use a different name for this filter
- threshold
Minimum percent of reads that can have indels in their flanks
- Type
float
- name = 'HipSTRCallFlankIndels'
The name of the filter
- class trtools.dumpSTR.filters.HipSTRCallMinSuppReads(threshold, rename=None)
Bases:
trtools.dumpSTR.filters.Reason
Filter HipSTR calls for which alleles are supported by too few reads This filter may also be used on LongTR which has similar fields.
Extends Reason class. Filters on the number of reads supporting each called allele. Based on FORMAT:ALLREADS and FORMAT:GB fields. Assumes that number of supporting reads is zero if: * that read length is not present in ALLREADS * or ALLREADS is unset for that sample (‘.’) * or ALLREADS is not present at that locus
- Parameters
threshold (int) – Minimum number of reads supporting each allele
rename (str (optional)) – Use a different name for this filter
- threshold
Minimum number of reads supporting each allele
- Type
int
- name = 'HipSTRMinSuppReads'
The name of the filter
- class trtools.dumpSTR.filters.HipSTRCallStutter(threshold, rename=None)
Bases:
trtools.dumpSTR.filters.Reason
Filter HipSTR calls with many stutter reads
Extends Reason class. Filters on the percentage of reads with stutter errors Based on FORMAT:DSTUTTER and FORMAT:DP fields.
- Parameters
threshold (float) – Minimum percent of reads that can have stutter errors
rename (str (optional)) – Use a different name for this filter
- threshold
Minimum percent of reads that can have stutter errors
- Type
float
- name = 'HipSTRCallStutter'
The name of the filter
- class trtools.dumpSTR.filters.PopSTRCallRequireSupport(threshold)
Bases:
trtools.dumpSTR.filters.Reason
Filter PopSTR calls not supported by enough reads
Extends Reason class. Relies on FORMAT:AD field.
- Parameters
threshold (int) – Require this many reads supporting each called allele.
- threshold
Require this many reads supporting each called allele.
- Type
int
- name = 'PopSTRCallRequireSupport'
The name of the filter
- class trtools.dumpSTR.filters.Reason
Bases:
object
Base call-level filter class.
Other classes extend this for each different call-level filter. Classes that extend this must implement a __call__ function that gets applied to each call. The __call__ function returns a 1D array of values, one per sample. For numeric arrays, nan values indicate the sample wasn’t filtered, and any other value indicates that the sample was filtered (where the value indicates why).
- GetReason()
- name = ''
FILTER field of filtered calls.
- Type
The name of the filter to put in the FORMAT
- trtools.dumpSTR.filters.create_region_filter(name, filename)
Creates a locus-level filter based on a file of regions.
Builds and returns a class extending Base that can be used to filter any records overlapping intervals in the input BED file
- Parameters
name (str) – Name of the region filter to create. This will go in the FILTER field of the output VCF
filename (str) – BED file containing the regions. Must be sorted by chrom, start If it’s not bgzipped and tabixed, we’ll attempt to do that.
Returns –
filter_regions (Base object.) – Returns None if we fail to load the regions
------- –