TRTools: a toolkit for genome-wide tandem repeat analysis

https://travis-ci.org/gymreklab/TRTools.svg?branch=master https://codecov.io/gh/gymreklab/TRTools/branch/master/graph/badge.svg

TRTools includes a variety of utilities for filtering, quality control and analysis of tandem repeats downstream of genotyping them from next-generation sequencing. It supports multiple recent genotyping tools (see below).

See full documentation and examples at https://trtools.readthedocs.io/en/latest/.

If you use TRTools in your work, please cite: Nima Mousavi, Jonathan Margoliash, Neha Pusarla, Shubham Saini, Richard Yanicky, Melissa Gymrek. (2020) TRTools: a toolkit for genome-wide analysis of tandem repeats. Bioinformatics. (https://doi.org/10.1093/bioinformatics/btaa736)

Install

You can install TRTools with conda:

conda install -c bioconda trtools

Note: Bioconda only supports python versions 3.6 and 3.7 currently, so that is all TRTools supports in conda. If you are using a different version of python we support (3.5 or >= 3.8), install TRTools using pip.

You can alternatively obtain TRTools from pip:

pip install --upgrade pip
pip install trtools

(Note: trtools installation may fail for pip version 10.0.1, hence the need to upgrade pip first)

To install from source (only recommended for development) download the TRTools repository from github, checkout the branch you’re interested in, and run the following command from the base directory of the repo. e.g.:

git clone https://github.com/gymreklab/TRTools
cd TRTools/
pip install --upgrade pip
pip install .

(Note, required package pybedtools requires zlib. If you receive an error about a missing file zlib.h, you can install on Ubuntu using sudo apt-get install zlib1g-dev or CentOS using sudo yum install zlib-devel.)

Tools

TRTools includes the following tools.

  • mergeSTR: a tool to merge VCF files across multiple samples genotyped using the same tool

  • dumpSTR: a tool for filtering VCF files with TR genotypes

  • qcSTR: a tool for generating various quality control plots for a TR callset

  • statSTR: a tool for computing various statistics on VCF files

  • compareSTR: a tool for comparing TR callsets

Type <command> --help to see a full set of options.

It additionally includes a python library, trtools, which can be accessed from within Python scripts. e.g.:

import trtools.utils.utils as stls
allele_freqs = {5: 0.5, 6: 0.5} # 50% of alleles have 5 repeat copies, 50% have 6
stls.GetHeterozygosity(allele_freqs) # should return 0.5

Usage

We recommend new users start with the example commands described in the command-line interface for each tool. We also suggest going through our vignettes that walk through some example workflows using TRTools.

Supported TR Callers

TRTools supports VCFs from the following TR genotyping tools:

See our description of the features and example use-cases of each of these tools.

Development Notes

  • TRTools only currently supports diploid genotypes. Haploid calls, such as those on male chrX or chrY, are not yet supported but should be coming soon.

  • TRTools currently works on top of the PyVCF library, which is prohibitively slow for biobank scale VCFs. We plan in a future release to move to cyvcf2 which is over ten times faster.

  • CompareSTR currently only compares STR loci between two callsets if they have the same start and end coordinates. In the future we will add the capacity to take in a user specificied mapping of loci between the two callsets and use that to compare loci even if they don’t completely overlap one another.

Contributing

We appreciate contributions to TRTools. If you would like to contribute a fix or new feature, follow these guidelines:

  1. Consider discussing your solution with us first so we can provide help or feedback if necessary.

  2. Create a clean environment with the dependencies in requirements.txt installed.

  3. Additionally, install pytest, pytest-cov and sphinx>=3 in your environment.

  4. Fork the trtools repository.

  5. The develop branch contains the latest pre-release codebase. Create a branch off of develop titled with the name of your feature.

  6. Make your changes.

  7. Document your changes.

  • Ensure all functions, modules, classes etc. conform to numpy docstring standards.

    If applicable, update the REAMDEs in the directories of the files you changed with new usage information.

  • If you have added significant amounts of new documentation then build the documentation locally to ensure it looks good.

    cd to the doc directory and run make clean && make html, then view doc/_build/html/index.html and navigate from there

  1. Add tests to test any new functionality. Add them to the tests/ folder in the directory of the code you modified.

  • cd to the root of the project and run python -m pytest --cov=. --cov-report term-missing to make sure that (1) all tests pass and (2) any code you have added is covered by tests. (Code coverage may not go down).

  1. Submit a pull request to the develop branch of the central repository with a description of what changes you have made. A member of the TRTools team will reply and continue the contribution process from there, possibly asking for additional information/effort on your part.

Publishing

If you are a trtools maintainer and wish to publish changes from the develop branch into master and distribute them to PyPI and bioconda, please see PUBLISHING.rst in the root of the git repo. If you are a community member and would like that to happen, contact us (see below).

Contact Us

Please submit an issue on the trtools github