trtools.mergeSTR module

trtools.mergeSTR.GetAltAlleles(ref_allele, current_records, mergelist, vcftype)

Get list of alt alleles

Parameters
  • ref_allele (str) – The (flank trimmed) reference allele, in upper case

  • current_records (list of vcf.Record) – List of records being merged

  • mergelist (list of bool) – Indicates whether each record is included in merge

  • vcftype (Union[str, trtools.utils.tr_harmonizer.VcfTypes]) – The type of the VCFs these records came from

Returns

(alts, mappings) – alts is a list of alternate allele strings in all uppercase. mappings is a list of length equal to the number of records being merged. For record n, mappings[n] is a numpy array where an allele with index i in the original record has an index of mappings[n][i] in the output merged record. (the indicies stored in the arrays are number strings for fast output, e.g. ‘1’ or ‘2’) For example if the output record has ref allele ‘A’ and alternate alleles ‘AA,AAA,AAAA’ and input record n has ref allele ‘A’ and alternate alleles ‘AAAA,AAA’ then mappings[n] would be np.array([‘0’, ‘3’, ‘2’]).

original index      new index
rec_n.alleles[0] == out_rec.alleles[0] == 'A'
rec_n.alleles[1] == out_rec.alleles[3] == 'AAAA'
rec_n.alleles[2] == out_rec.alleles[2] == 'AAA'

Return type

(list of str, list of np.ndarray)

trtools.mergeSTR.GetAltsByKey(current_records, mergelist, key)
Parameters
trtools.mergeSTR.GetID(idval)

Get the ID for a a record

If not set, output “.”

Parameters

idval (str) – ID of the record

Returns

idval – Return ID. if None, return “.”

Return type

str

trtools.mergeSTR.GetInfoItem(current_records, mergelist, info_field, fail=True)

Get INFO item for a group of records

Make sure it’s the same across merged records if fail=True, die if items not the same. if fail=False, only do something if we have a rule on how to handle that field

Parameters
  • current_records (list of vcf.Record) – List of records being merged

  • mergelist (list of bool) – List of indicators of whether to merge each record

  • info_field (str) – INFO field being merged

  • fail (bool) – If True, throw error if fields don’t have same value

Returns

infostring – INFO string to add (key=value)

Return type

str

trtools.mergeSTR.GetRefAllele(current_records, mergelist, vcfType)

Get reference allele for a set of records

Parameters
  • current_records (list of vcf.Record) – List of records being merged

  • mergelist (list of bool) – Indicates whether each record is included in merge

  • vcfType (trtools.utils.tr_harmonizer.VcfTypes) –

Returns

ref – Reference allele string. Set to None if conflicting references are found.

Return type

str

trtools.mergeSTR.HarmonizeIfNotNone(records, vcf_type)
Parameters
trtools.mergeSTR.MergeRecords(readers, vcftype, num_samples, current_records, mergelist, vcfw, useinfo, useformat, format_type)

Merge records from different files

Merge all records with indicator set to True in mergelist Output merged record to vcfw If the merged records were created by HipSTR and contain flanking BPs, these BPs are removed

Parameters
  • readers (list of vcf.Reader) – List of readers being merged

  • vcftype (Union[str, trtools.utils.tr_harmonizer.VcfTypes]) – Type of the readers

  • num_samples (list of int) – Number of samples per vcf

  • current_records (list of vcf.Record) – List of current records for each reader

  • mergelist (list of bool) – Indicates whether to include each reader in merge

  • vcfw (file) – File to write output to

  • useinfo (list of (str, bool)) – List of (info field, required) to use downstream

  • useformat (list of str) – List of format field strings to use downstream

  • format_type (list of String) – The type of each format field

Return type

None

class trtools.mergeSTR.TextIO(*args, **kwds)

Bases: IO[str]

Typed version of the return of open() in text mode.

abstract property buffer: BinaryIO
abstract property encoding: str
abstract property errors: Optional[str]
abstract property line_buffering: bool
abstract property newlines: Any
trtools.mergeSTR.WriteMergedHeader(vcfw, args, readers, cmd, vcftype)

Write merged header for VCFs in args.vcfs

Also do some checks on the VCFs to make sure merging is appropriate. Return info and format fields to use

Parameters
  • vcfw (file object) – Writer to write the merged VCF

  • args (argparse namespace) – Contains user options

  • readers (list of vcf.Reader) – List of readers to merge

  • cmd (str) – Command used to call this program

  • vcftype (str) – Type of VCF files being merged

Returns

  • useinfo (list of (str, bool)) – List of (info field, required) to use downstream

  • useformat (list of str) – List of format field strings to use downstream

Return type

Union[Tuple[List[Tuple[str, bool]], List[str]], Tuple[None, None]]

trtools.mergeSTR.WriteSampleData(vcfw, record, alleles, formats, format_type, mapping)

Output sample FORMAT data

Writes a string representation of the GT and other format fields for each sample in the record, with tabs in between records

Parameters
  • vcfw (file) – File to write output to

  • record (cyvcf2.Varaint) – VCF record being summarized

  • alleles (list of str) – List of REF + ALT alleles

  • formats (list of str) – List of VCF FORMAT items

  • format_type (list of String) – The type of each format field

  • mapping (np.ndarray) – See GetAltAlleles

Return type

None

trtools.mergeSTR.getargs()
Return type

Any

trtools.mergeSTR.main(args)
Parameters

args (Any) –

Return type

int

trtools.mergeSTR.run()
Return type

None