trtools.mergeSTR module
- trtools.mergeSTR.GetAltAlleles(ref_allele, current_records, mergelist, vcftype)
Get list of alt alleles
- Parameters
ref_allele (str) – The (flank trimmed) reference allele, in upper case
current_records (list of vcf.Record) – List of records being merged
mergelist (list of bool) – Indicates whether each record is included in merge
vcftype (Union[str, trtools.utils.tr_harmonizer.VcfTypes]) – The type of the VCFs these records came from
- Returns
(alts, mappings) – alts is a list of alternate allele strings in all uppercase. mappings is a list of length equal to the number of records being merged. For record n, mappings[n] is a numpy array where an allele with index i in the original record has an index of mappings[n][i] in the output merged record. (the indicies stored in the arrays are number strings for fast output, e.g. ‘1’ or ‘2’) For example if the output record has ref allele ‘A’ and alternate alleles ‘AA,AAA,AAAA’ and input record n has ref allele ‘A’ and alternate alleles ‘AAAA,AAA’ then mappings[n] would be np.array([‘0’, ‘3’, ‘2’]).
original index new index rec_n.alleles[0] == out_rec.alleles[0] == 'A' rec_n.alleles[1] == out_rec.alleles[3] == 'AAAA' rec_n.alleles[2] == out_rec.alleles[2] == 'AAA'
- Return type
(list of str, list of np.ndarray)
- trtools.mergeSTR.GetAltsByKey(current_records, mergelist, key)
- Parameters
current_records (List[trtools.utils.tr_harmonizer.TRRecord]) –
mergelist (List[bool]) –
key (Any) –
- trtools.mergeSTR.GetID(idval)
Get the ID for a a record
If not set, output “.”
- Parameters
idval (str) – ID of the record
- Returns
idval – Return ID. if None, return “.”
- Return type
str
- trtools.mergeSTR.GetInfoItem(current_records, mergelist, info_field, fail=True)
Get INFO item for a group of records
Make sure it’s the same across merged records if fail=True, die if items not the same. if fail=False, only do something if we have a rule on how to handle that field
- Parameters
current_records (list of vcf.Record) – List of records being merged
mergelist (list of bool) – List of indicators of whether to merge each record
info_field (str) – INFO field being merged
fail (bool) – If True, throw error if fields don’t have same value
- Returns
infostring – INFO string to add (key=value)
- Return type
str
- trtools.mergeSTR.GetRefAllele(current_records, mergelist, vcfType)
Get reference allele for a set of records
- Parameters
current_records (list of vcf.Record) – List of records being merged
mergelist (list of bool) – Indicates whether each record is included in merge
vcfType (trtools.utils.tr_harmonizer.VcfTypes) –
- Returns
ref – Reference allele string. Set to None if conflicting references are found.
- Return type
str
- trtools.mergeSTR.HarmonizeIfNotNone(records, vcf_type)
- Parameters
records (List[Optional[trtools.utils.tr_harmonizer.TRRecord]]) –
vcf_type (trtools.utils.tr_harmonizer.VcfTypes) –
- trtools.mergeSTR.MergeRecords(readers, vcftype, num_samples, current_records, mergelist, vcfw, useinfo, useformat, format_type)
Merge records from different files
Merge all records with indicator set to True in mergelist Output merged record to vcfw If the merged records were created by HipSTR and contain flanking BPs, these BPs are removed
- Parameters
readers (list of vcf.Reader) – List of readers being merged
vcftype (Union[str, trtools.utils.tr_harmonizer.VcfTypes]) – Type of the readers
num_samples (list of int) – Number of samples per vcf
current_records (list of vcf.Record) – List of current records for each reader
mergelist (list of bool) – Indicates whether to include each reader in merge
vcfw (file) – File to write output to
useinfo (list of (str, bool)) – List of (info field, required) to use downstream
useformat (list of str) – List of format field strings to use downstream
format_type (list of String) – The type of each format field
- Return type
None
- class trtools.mergeSTR.TextIO(*args, **kwds)
Bases:
IO
[str
]Typed version of the return of open() in text mode.
- abstract property buffer: BinaryIO
- abstract property encoding: str
- abstract property errors: Optional[str]
- abstract property line_buffering: bool
- abstract property newlines: Any
- trtools.mergeSTR.WriteMergedHeader(vcfw, args, readers, cmd, vcftype)
Write merged header for VCFs in args.vcfs
Also do some checks on the VCFs to make sure merging is appropriate. Return info and format fields to use
- Parameters
vcfw (file object) – Writer to write the merged VCF
args (argparse namespace) – Contains user options
readers (list of vcf.Reader) – List of readers to merge
cmd (str) – Command used to call this program
vcftype (str) – Type of VCF files being merged
- Returns
useinfo (list of (str, bool)) – List of (info field, required) to use downstream
useformat (list of str) – List of format field strings to use downstream
- Return type
Union[Tuple[List[Tuple[str, bool]], List[str]], Tuple[None, None]]
- trtools.mergeSTR.WriteSampleData(vcfw, record, alleles, formats, format_type, mapping)
Output sample FORMAT data
Writes a string representation of the GT and other format fields for each sample in the record, with tabs in between records
- Parameters
vcfw (file) – File to write output to
record (cyvcf2.Varaint) – VCF record being summarized
alleles (list of str) – List of REF + ALT alleles
formats (list of str) – List of VCF FORMAT items
format_type (list of String) – The type of each format field
mapping (np.ndarray) – See GetAltAlleles
- Return type
None
- trtools.mergeSTR.getargs()
- Return type
Any
- trtools.mergeSTR.main(args)
- Parameters
args (Any) –
- Return type
int
- trtools.mergeSTR.run()
- Return type
None