trtools.utils.mergeutils module
Utilities for reading multiple VCFs simulataneously and keeping them in sync.
- trtools.utils.mergeutils.CheckMin(is_min)
Check if we’re progressing through VCFs
- Parameters
is_min (list of bool) – List indicating if each record is first in sort order
- Returns
check – Set to True if something went wrong
- Return type
bool
- trtools.utils.mergeutils.CheckPos(record, chrom, pos)
Check a record is at the specified position
- Parameters
r (vcf.Record) – VCF Record being checked
chrom (str) – Chromosome name
pos (int) – Chromosome position
record (cyvcf2.cyvcf2.Variant) –
- Returns
check – Return True if the current record is at this position
- Return type
bool
- trtools.utils.mergeutils.DebugPrintRecordLocations(current_records, is_min)
Debug function to print current records for each file
- Parameters
current_records (list of vcf.Record) – List of current records from merged files
is_min (list of bool) – List of check for if record is first in sort order
- Return type
None
- trtools.utils.mergeutils.DoneReading(records)
Check if all records are at the end of the file
- Parameters
records (list of vcf.Record) – List of records from files to merge
- Returns
check – Set to True if all record is None indicating we’re done reading the file
- Return type
list of bool
- trtools.utils.mergeutils.GetAndCheckVCFType(vcfs, vcftype)
Infer type of multiple VCFs
If they are all the same, return that type If not, return error
- Parameters
vcfs (list of cyvcf2.VCF) – Multiple VCFs
vcftype (str) – If it is unclear which of a few VCF callers produced the underlying VCFs (because the output markings of those VCF callers are similar) this string can be supplied by the user to choose from among those callers.
- Returns
vcftype – Inferred VCF type
- Return type
str
- Raises
TypeError – If one of the VCFs does not look like it was produced by any supported TR caller, or if one of the VCFs looks like it could have been produced by more than one supported TR caller and vcftype == ‘auto’, or if, for one of the VCFs, vcftype doesn’t match any of the callers that could have produced that VCF, or if the types of the VCFs don’t match
- trtools.utils.mergeutils.GetChromOrder(r, chroms)
Get the chromosome order of a record
- Parameters
r (vcf.Record) –
chroms (list of str) – Ordered list of chromosomes
- Returns
order – Index of r.CHROM in chroms, int Return np.inf if can’t find r.CHROM, float
- Return type
int or float
- trtools.utils.mergeutils.GetChromOrderEqual(chrom_order, min_chrom)
Check chrom order
- Parameters
chrom_order (int) – Chromosome order
min_chrom (int) – Current chromosome order
- Returns
equal – Return True if chrom_order==min_chrom and chrom_order != np.inf
- Return type
bool
- trtools.utils.mergeutils.GetIncrementAndComparability(record_list, chroms, overlap_callback=<function default_callback>)
- Get list that says which records should be skipped in the next
iteration (increment), and whether they are all comparable / mergable The value of increment elements is determined by the (harmonized) position of corresponding records
- Parameters
record_list (trh.TRRecord) – list of current records from each file being merged
chroms (list of str) – Ordered list of all chromosomes
overlap_callback (Callable[[List[Optional[trh.TRRecord]], List[int], int], Union[bool, List[bool]]) – Function that calculates whether the records are comparable
- Returns
increment (list of bool) – List or bools, where items are set to True when the record at the index of the item should be skipped during VCF file comparison.
comparable (bool or list of bool) – Value, that determines whether current records are comparable / mergable, depending on the callback
- Return type
Tuple[List[bool], Union[bool, List[bool]]]
- trtools.utils.mergeutils.GetMinRecords(record_list, chroms)
Check if each record is next up in sort order
Return a vector of boolean set to true if the record is in lowest sort order of all the records Use order in chroms to determine sort order of chromosomes
- Parameters
record_list (list of CYVCF_RECORD) – list of current records from each file being merged
chroms (list of str) – Ordered list of all chromosomes
- Returns
checks – Set to True for records that are first in sort order
- Return type
list of bool
- trtools.utils.mergeutils.GetNextRecords(readers, current_records, increment)
Increment readers of each file
Increment readers[i] if increment[i] set to true Else keep current_records[i]
- Parameters
readers (list of vcf.Reader) – List of readers for all files being merged
current_records (list of vcf.Record) – List of current records for all readers
increment (list of bool) – List indicating if each file should be incremented
- Returns
new_records – List of next records for each file
- Return type
list of vcf.Record
- trtools.utils.mergeutils.GetPos(r)
Get the position of a record
- Parameters
r (vcf.Record) –
- Returns
pos – If r is None, returns np.inf, which is a float
- Return type
int
- trtools.utils.mergeutils.GetSamples(readers, filenames=None)
Get list of samples used in all files being merged
- Parameters
readers (list of cyvcf2.VCF) –
usefilenames (optional list of filenames) – If present, add filename to sample names. Useful if sample names overlap across files Must be the same length as readers
filenames (Optional[str]) –
- Returns
samples – List of samples in merged list
- Return type
list of str
Get list of samples used in all files being merged
- Parameters
readers (list of cyvcf.VCF objects) –
- Returns
samples – Samples present in all readers
- Return type
list of str
- trtools.utils.mergeutils.InitReaders(readers)
Increment readers of each file
Returns list of first records from list of readers.
- Parameters
readers (list of cyvcf2.VCF) – List of readers for all files being merged
- Returns
List of next records for each file
- Return type
list of vcf.Record
- trtools.utils.mergeutils.LoadReaders(vcffiles, region=None)
Return list of VCF readers
- Parameters
vcffiles (list of str) – List of VCF files to merge
region (str, optional) – Chrom:start-end to restrict to
- Returns
readers – VCF readers list for all files to merge
- Return type
list of vcf.Reader
- trtools.utils.mergeutils.default_callback(records, chrom_order, min_chrom_index)
- Parameters
records (List[trtools.utils.tr_harmonizer.TRRecord]) –
chrom_order (List[int]) –
min_chrom_index (int) –
- Return type
bool