trtools.statSTR module

trtools.statSTR.GetAFreq(trrecord, sample_indexes=[None], count=False, uselength=True)

Return allele frequency for a TR

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • count (bool) – If True, return allele counts rather than allele frequencies

  • uselength (bool) – Whether we should collapse alleles by length

Returns

allele_freqs_strs – Format: allele1:freq1,allele2:freq2,etc. for each sample group Only alleles with more than one call in a group are reported for that group. Groups with no called alleles are reported as ‘.’

Return type

list of str

trtools.statSTR.GetEntropy(trrecord, sample_indexes=[None], uselength=True)

Compute the entropy of a locus

This is the (bit) entropy of the distribution of alleles called at that locus. See wikipedia for the definition of entropy.

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • uselength (bool) – Whether we should collapse alleles by length

Returns

heterozygosity – For each sample list, the entropy of the calls for those samples, or np.nan if no such calls

Return type

List[float]

trtools.statSTR.GetHWEP(trrecord, sample_indexes=[None], uselength=True)

Compute Hardy Weinberg p-value

Tests whether the number of observed heterozygous vs. homozygous individuals is different than expected under Hardy Weinberg Equilibrium given the observed allele frequencies, based on a binomial test.

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • uselength (bool) – Whether we should collapse alleles by length

Returns

p-value – The two-sided p-value returned by a binomial test (scipy.stats.binom_test) If there are no calls, return np.nan If the genotype alleles not included in frequencies dictionary, return np.nan One value returned for each sample_index

Return type

list of float

trtools.statSTR.GetHeader(header, sample_prefixes)

Return header items for a column

Parameters
  • header (str) – Header item

  • sample_prefixes (list of str) – List of sample prefixes. empty if no sample groups used

Returns

header_items – List of header items

Return type

list of str

trtools.statSTR.GetHet(trrecord, sample_indexes=[None], uselength=True)

Compute heterozygosity of a locus

Heterozygosity is defined as the probability that two randomly drawn allele are different.

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • uselength (bool) – Whether we should collapse alleles by length

Returns

heterozygosity – For each sample list, the heterozypostiy of the calls for those samples, or np.nan if no such calls

Return type

List[float]

trtools.statSTR.GetMean(trrecord, sample_indexes=[None], uselength=True)

Compute the mean allele length

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • uselength (bool) –

Returns

mean – For each sample list, the mean allele length, or np.nan if no calls for that sample

Return type

List[float]

trtools.statSTR.GetMode(trrecord, sample_indexes=[None], uselength=True)

Compute the mode of the allele lengths

Parameters
  • trrecord (trh.TRRecord object) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • uselength (bool) –

Returns

mean – For each sample list, the mode allele length, or np.nan if no calls for that sample

Return type

List[float]

trtools.statSTR.GetNAlleles(trrecord, sample_indexes=[None], nalleles_thresh=0.01, uselength=True)

Return allele frequency for a TR

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • nalleles_thresh (float) – The threshold which an allele’s frequency must exceed to be counted

  • uselength (bool) – Whether we should collapse alleles by length

Returns

Number of called alleles at this locus per sample index. Zero if no alleles were called.

Return type

List[int]

trtools.statSTR.GetNumSamples(trrecord, sample_indexes=[None])

Compute the number of samples

Parameters
  • trrecord (trh.TRRecord object) – The record that we are computing the statistic for

  • sample_indexes – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

Returns

numSamples – The number of samples. One value for each sample list If the allele frequencies dictionary is invalid, return np.nan

Return type

list of int

trtools.statSTR.GetThresh(trrecord, sample_indexes=[None])

Return the maximum TR allele length observed

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

Returns

thresh – List of Maximum allele length observed in each sample group, or nan if no alleles called

Return type

List[float]

trtools.statSTR.GetVariance(trrecord, sample_indexes=[None], uselength=True)

Compute the variance of the allele lengths

Parameters
  • trrecord (trtools.utils.tr_harmonizer.TRRecord) – The record that we are computing the statistic for

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • uselength (bool) –

Returns

variance – For each sample list, the variance of the allele lengths, or np.nan if no calls for that sample

Return type

List[float]

trtools.statSTR.PlotAlleleFreqs(trrecord, outprefix, sample_indexes=[None], sampleprefixes=None)

Plot allele frequencies for a locus

Parameters
  • trrecord (trh.TRRecord object) – The record that we are computing the statistic for

  • outprefix (str) – Prefix for output file

  • sample_indexes (List[Any]) – A list of indexes into the numpy rows array to extract subsets of genotypes to stratify over. (e.g. [[True, False, False], [False, True, True]] or [[0], [1,2]] to split three samples into two strata - the first sample and the last two) Can contain None for all samples.

  • sampleprefixes (list of str, optional) – Prefixes for each sample list to use in legend

trtools.statSTR.format_nan_precision(precision_format, val)
trtools.statSTR.getargs()
trtools.statSTR.main(args)
trtools.statSTR.run()