trtools.utils.mergeutils module

Utilities for reading multiple VCFs simulataneously and keeping them in sync.

trtools.utils.mergeutils.CheckMin(is_min)

Check if we’re progressing through VCFs

Parameters

is_min (list of bool) – List indicating if each record is first in sort order

Returns

check – Set to True if something went wrong

Return type

bool

trtools.utils.mergeutils.CheckPos(record, chrom, pos)

Check a record is at the specified position

Parameters
  • r (vcf.Record) – VCF Record being checked

  • chrom (str) – Chromosome name

  • pos (int) – Chromosome position

  • record (cyvcf2.cyvcf2.Variant) –

Returns

check – Return True if the current record is at this position

Return type

bool

trtools.utils.mergeutils.DebugPrintRecordLocations(current_records, is_min)

Debug function to print current records for each file

Parameters
  • current_records (list of vcf.Record) – List of current records from merged files

  • is_min (list of bool) – List of check for if record is first in sort order

Return type

None

trtools.utils.mergeutils.DoneReading(records)

Check if all records are at the end of the file

Parameters

records (list of vcf.Record) – List of records from files to merge

Returns

check – Set to True if all record is None indicating we’re done reading the file

Return type

list of bool

trtools.utils.mergeutils.GetAndCheckVCFType(vcfs, vcftype)

Infer type of multiple VCFs

If they are all the same, return that type If not, return error

Parameters
  • vcfs (list of cyvcf2.VCF) – Multiple VCFs

  • vcftype (str) – If it is unclear which of a few VCF callers produced the underlying VCFs (because the output markings of those VCF callers are similar) this string can be supplied by the user to choose from among those callers.

Returns

vcftype – Inferred VCF type

Return type

str

Raises

TypeError – If one of the VCFs does not look like it was produced by any supported TR caller, or if one of the VCFs looks like it could have been produced by more than one supported TR caller and vcftype == ‘auto’, or if, for one of the VCFs, vcftype doesn’t match any of the callers that could have produced that VCF, or if the types of the VCFs don’t match

trtools.utils.mergeutils.GetChromOrder(r, chroms)

Get the chromosome order of a record

Parameters
  • r (vcf.Record) –

  • chroms (list of str) – Ordered list of chromosomes

Returns

order – Index of r.CHROM in chroms, int Return np.inf if can’t find r.CHROM, float

Return type

int or float

trtools.utils.mergeutils.GetChromOrderEqual(chrom_order, min_chrom)

Check chrom order

Parameters
  • chrom_order (int) – Chromosome order

  • min_chrom (int) – Current chromosome order

Returns

equal – Return True if chrom_order==min_chrom and chrom_order != np.inf

Return type

bool

trtools.utils.mergeutils.GetIncrementAndComparability(record_list, chroms, overlap_callback=<function default_callback>)
Get list that says which records should be skipped in the next

iteration (increment), and whether they are all comparable / mergable The value of increment elements is determined by the (harmonized) position of corresponding records

Parameters
  • record_list (trh.TRRecord) – list of current records from each file being merged

  • chroms (list of str) – Ordered list of all chromosomes

  • overlap_callback (Callable[[List[Optional[trh.TRRecord]], List[int], int], Union[bool, List[bool]]) – Function that calculates whether the records are comparable

Returns

  • increment (list of bool) – List or bools, where items are set to True when the record at the index of the item should be skipped during VCF file comparison.

  • comparable (bool or list of bool) – Value, that determines whether current records are comparable / mergable, depending on the callback

Return type

Tuple[List[bool], Union[bool, List[bool]]]

trtools.utils.mergeutils.GetMinRecords(record_list, chroms)

Check if each record is next up in sort order

Return a vector of boolean set to true if the record is in lowest sort order of all the records Use order in chroms to determine sort order of chromosomes

Parameters
  • record_list (list of CYVCF_RECORD) – list of current records from each file being merged

  • chroms (list of str) – Ordered list of all chromosomes

Returns

checks – Set to True for records that are first in sort order

Return type

list of bool

trtools.utils.mergeutils.GetNextRecords(readers, current_records, increment)

Increment readers of each file

Increment readers[i] if increment[i] set to true Else keep current_records[i]

Parameters
  • readers (list of vcf.Reader) – List of readers for all files being merged

  • current_records (list of vcf.Record) – List of current records for all readers

  • increment (list of bool) – List indicating if each file should be incremented

Returns

new_records – List of next records for each file

Return type

list of vcf.Record

trtools.utils.mergeutils.GetPos(r)

Get the position of a record

Parameters

r (vcf.Record) –

Returns

pos – If r is None, returns np.inf, which is a float

Return type

int

trtools.utils.mergeutils.GetSamples(readers, filenames=None)

Get list of samples used in all files being merged

Parameters
  • readers (list of cyvcf2.VCF) –

  • usefilenames (optional list of filenames) – If present, add filename to sample names. Useful if sample names overlap across files Must be the same length as readers

  • filenames (Optional[str]) –

Returns

samples – List of samples in merged list

Return type

list of str

trtools.utils.mergeutils.GetSharedSamples(readers)

Get list of samples used in all files being merged

Parameters

readers (list of cyvcf.VCF objects) –

Returns

samples – Samples present in all readers

Return type

list of str

trtools.utils.mergeutils.InitReaders(readers)

Increment readers of each file

Returns list of first records from list of readers.

Parameters

readers (list of cyvcf2.VCF) – List of readers for all files being merged

Returns

List of next records for each file

Return type

list of vcf.Record

trtools.utils.mergeutils.LoadReaders(vcffiles, region=None)

Return list of VCF readers

Parameters
  • vcffiles (list of str) – List of VCF files to merge

  • region (str, optional) – Chrom:start-end to restrict to

Returns

readers – VCF readers list for all files to merge

Return type

list of vcf.Reader

trtools.utils.mergeutils.default_callback(records, chrom_order, min_chrom_index)
Parameters
Return type

bool