trtools.statSTR module
- trtools.statSTR.GetAFreq(trrecord, sample_indexes=[None], count=False, uselength=True)
Return allele frequency for a TR
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
count (bool) – If True, return allele counts rather than allele frequencies
uselength (bool) – Whether we should collapse alleles by length
- Returns
allele_freqs_strs – Format: allele1:freq1,allele2:freq2,etc. for each sample group Only alleles with more than one call in a group are reported for that group. Groups with no called alleles are reported as ‘.’
- Return type
list of str
- trtools.statSTR.GetEntropy(trrecord, sample_indexes=[None], uselength=True)
Compute the entropy of a locus
This is the (bit) entropy of the distribution of alleles called at that locus. See wikipedia for the definition of entropy.
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
uselength (bool) – Whether we should collapse alleles by length
- Returns
heterozygosity – For each sample list, the entropy of the calls for those samples, or np.nan if no such calls
- Return type
List[float]
- trtools.statSTR.GetHWEP(trrecord, sample_indexes=[None], uselength=True)
Compute Hardy Weinberg p-value
Tests whether the number of observed heterozygous vs. homozygous individuals is different than expected under Hardy Weinberg Equilibrium given the observed allele frequencies, based on a binomial test.
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
uselength (bool) – Whether we should collapse alleles by length
- Returns
p-value – The two-sided p-value returned by a binomial test (scipy.stats.binom_test) If there are no calls, return np.nan If the genotype alleles not included in frequencies dictionary, return np.nan One value returned for each sample_index
- Return type
list of float
- trtools.statSTR.GetHeader(header, sample_prefixes)
Return header items for a column
- Parameters
header (str) – Header item
sample_prefixes (list of str) – List of sample prefixes. empty if no sample groups used
- Returns
header_items – List of header items
- Return type
list of str
- trtools.statSTR.GetHet(trrecord, sample_indexes=[None], uselength=True)
Compute heterozygosity of a locus
Heterozygosity is defined as the probability that two randomly drawn allele are different.
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
uselength (bool) – Whether we should collapse alleles by length
- Returns
heterozygosity – For each sample list, the heterozypostiy of the calls for those samples, or np.nan if no such calls
- Return type
List[float]
- trtools.statSTR.GetMean(trrecord, sample_indexes=[None], uselength=True)
Compute the mean allele length
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
uselength (bool) –
- Returns
mean – For each sample list, the mean allele length, or np.nan if no calls for that sample
- Return type
List[float]
- trtools.statSTR.GetMode(trrecord, sample_indexes=[None], uselength=True)
Compute the mode of the allele lengths
- Parameters
trrecord (trh.TRRecord object) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
uselength (bool) –
- Returns
mean – For each sample list, the mode allele length, or np.nan if no calls for that sample
- Return type
List[float]
- trtools.statSTR.GetNAlleles(trrecord, sample_indexes=[None], nalleles_thresh=0.01, uselength=True)
Return allele frequency for a TR
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
nalleles_thresh (float) – The threshold which an allele’s frequency must exceed to be counted
uselength (bool) – Whether we should collapse alleles by length
- Returns
Number of called alleles at this locus per sample index. Zero if no alleles were called.
- Return type
List[int]
- trtools.statSTR.GetNumSamples(trrecord, sample_indexes=[None])
Compute the number of samples
- Parameters
trrecord (trh.TRRecord object) – The record that we are computing the statistic for
sample_indexes – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
- Returns
numSamples – The number of samples. One value for each sample list If the allele frequencies dictionary is invalid, return np.nan
- Return type
list of int
- trtools.statSTR.GetThresh(trrecord, sample_indexes=[None])
Return the maximum TR allele length observed
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
- Returns
thresh – List of Maximum allele length observed in each sample group, or nan if no alleles called
- Return type
List[float]
- trtools.statSTR.GetVariance(trrecord, sample_indexes=[None], uselength=True)
Compute the variance of the allele lengths
- Parameters
trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
uselength (bool) –
- Returns
variance – For each sample list, the variance of the allele lengths, or np.nan if no calls for that sample
- Return type
List[float]
- trtools.statSTR.PlotAlleleFreqs(trrecord, outprefix, sample_indexes=[None], sampleprefixes=None)
Plot allele frequencies for a locus
- Parameters
trrecord (trh.TRRecord object) – The record that we are computing the statistic for
outprefix (str) – Prefix for output file
sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.
sampleprefixes (list of str, optional) – Prefixes for each sample list to use in legend
- trtools.statSTR.format_nan_precision(precision_format, val)
- trtools.statSTR.getargs()
- trtools.statSTR.main(args)
- trtools.statSTR.run()