TRTools: a toolkit for genome-wide tandem repeat analysis
TRTools
TRTools includes a variety of utilities for filtering, quality control and analysis of tandem repeats downstream of genotyping them from next-generation sequencing. It supports multiple recent genotyping tools (see below).
If you use TRTools in your work, please cite: Nima Mousavi, Jonathan Margoliash, Neha Pusarla, Shubham Saini, Richard Yanicky, Melissa Gymrek. (2020) TRTools: a toolkit for genome-wide analysis of tandem repeats. Bioinformatics. (https://doi.org/10.1093/bioinformatics/btaa736)
Install
Note: TRTools supports Python versions 3.8 and up. We do not officially support python version 3.7 as it is end of life, but we believe TRTools likely works with it from previous testing results.
With conda
conda install -c conda-forge -c bioconda trtools
Optionally install bcftools
which is used to prepare input files for TRTools (and ART
which is used by simTR) by running:
conda install -c conda-forge -c bioconda bcftools art
With pip
First install htslib
(which contains tabix
and bgzip
). Optionally install bcftools
.
These are used to prepare input files for TRTools and aren’t installed by pip.
Then run:
pip install --upgrade pip
pip install trtools
Note: TRTools installation may fail for pip version 10.0.1, hence the need to upgrade pip first
Note: if you will run or test simTR
, you will also need to install ART. The simTR tests will only run if the executable art_illumina
is found on your PATH
. If it has been installed, which art_illumina
should return a path.
From source
If you would like to develop or edit the TRTools source code, you will need to perform a “dev install” directly from the source.
Note: Instead of performing the following steps, you can also just open a GitHub codespace. Simply type a comma “,” when viewing a branch on our GitHub to open an editor with our development setup pre-installed. You can then run conda activate trtools
and poetry shell
in the terminal there.
You can clone the TRTools repository from github and checkout the branch you’re interested in:
git clone -b master https://github.com/gymrek-lab/TRTools
cd TRTools/
Now, create 1) a conda environment with our development tools and 2) a virtual environment with our dependencies and an editable install of TRTools:
conda env create -n trtools -f dev-env.yml
conda run -n trtools poetry install
Now, whenever you’d like to run/import pytest or TRTools, you will first need to activate both environments:
conda activate trtools
poetry shell
Note
There’s no need to install TRTools this way if you aren’t planning to develop or edit the source code! If you want the latest version from our master branch and just can’t wait for us to release it, you only need to run:
pip install --upgrade --force-reinstall git+https://github.com/gymrek-lab/trtools.git@master
With Docker
Please refer to the biocontainers registry for TRTools for all of our images. To use the most recent release, run the following command:
docker pull quay.io/biocontainers/trtools:latest
Tools
TRTools includes the following tools.
mergeSTR: a tool to merge VCF files across multiple samples genotyped using the same tool
dumpSTR: a tool for filtering VCF files with TR genotypes
qcSTR: a tool for generating various quality control plots for a TR callset
statSTR: a tool for computing various statistics on VCF files
compareSTR: a tool for comparing TR callsets
associaTR: a tool for testing TR length-phenotype associations (e.g., running a TR GWAS)
prancSTR: a tool for identifying somatic mosacisim at TRs. Currently only compatible with HipSTR VCF files. (beta mode)
simTR: a tool for simulating next-generation sequencing reads from TR regions. (beta mode)
annotaTR: a tool for annotating TR VCF files with dosage or other metadata and optionally converting to PGEN output.
Type <command> --help
to see a full set of options.
It additionally includes a python library, trtools
, which can be accessed from within Python scripts. e.g.:
import trtools.utils.utils as stls
allele_freqs = {5: 0.5, 6: 0.5} # 50% of alleles have 5 repeat copies, 50% have 6
stls.GetHeterozygosity(allele_freqs) # should return 0.5
Usage
We recommend new users start with the example commands described in the command-line interface for each tool. We also suggest going through our vignettes that walk through some example workflows using TRTools.
Supported TR Callers
TRTools supports VCFs from the following TR genotyping tools:
GangSTR version 2.4 or higher
HipSTR [main repo] [Gymrek Lab repo]
PopSTR version 2 or higher
See our description of the features and example use-cases of each of these tools.
Testing
After you’ve installed TRTools, we recommend running our tests to confirm that TRTools works properly on your system. Just execute the following:
test_trtools.sh
Development Notes
TRTools only currently supports diploid genotypes. Haploid calls, such as those on male chrX or chrY, are not yet supported but should be coming soon.
Contact Us
Please submit an issue on the trtools github
Contributing
We appreciate contributions to TRTools. If you would like to contribute a fix or new feature, follow these guidelines:
Consider discussing your solution with us first so we can provide help or feedback if necessary.
Install TRTools from source as above.
Fork the TRTools repository.
Create a branch off of
master
titled with the name of your feature.Make your changes.
If you need to add a dependency or update the version of a dependency, you can use the
poetry add
command.You should specify a version constraint when adding a dependency. Use the oldest version compatible with your code. For example, to specify a version of
numpy>=1.23.0
, you can runpoetry add 'numpy>=1.23.0'
. Note: For most cases, a version constraint operator of>=
is better than poetry’s default of^
.Afterward, double-check that the
poetry.lock
file contains 1.23.0 in it. All of our dependencies should be locked to their minimum versions at all times. To downgrade to a specific version ofnumpy
in our lock file, you can explicitly add the version viapoetry add 'numpy==1.23.0'
, manually edit the pyproject.toml file to use a>=
sign in front of the version number, and then runpoetry lock --no-update
. The--no-update
is important because otherwise, poetry will try to update other dependencies in the lock file.Only PyPI packages can be added to our pyproject.toml file. So if a dependency is only available on conda, then you can add it to our
dev-env.yml
file instead. Please note that anyone who installs TRTools from PyPI will not be guaranteed to have your dependency installed, so you should design your code accordingly.Any changes to our dependencies must also added to our bioconda recipe at the time of publication. See PUBLISHING.rst for more details.
Document your changes.
Ensure all functions, modules, classes etc. conform to numpy docstring standards.
If applicable, update the REAMDEs in the directories of the files you changed with new usage information.
New doc pages for the website can be created under
<project-root>/doc
and linked to as appropriate.If you have added significant amounts of documentation in any of these ways, build the documentation locally to ensure it looks good.
cd
to thedoc
directory and runmake clean && make html
, then viewdoc/_build/html/index.html
and navigate from thereAdd tests to test any new functionality. Add them to the
tests/
folder in the directory of the code you modified.cd
to the root of the project and runpoetry run pytest --cov=. --cov-report term-missing
to make sure that (1) all tests pass and (2) any code you have added is covered by tests. (Code coverage may not go down).cd
to the root of the project and runnox
to make sure that the tests pass on all versions of python that we support.
Submit a pull request (PR) to the master branch of the central repository with a description of what changes you have made. Prefix the title of the PR according to the conventional commits spec. A member of the TRTools team will reply and continue the contribution process from there, possibly asking for additional information/effort on your part.
If you are reviewing a pull request, please double-check that the PR addresses each item in our PR checklist
Publishing
If you are a TRTools maintainer and wish to publish changes and distribute them to PyPI and bioconda, please see PUBLISHING.rst in the root of the git repo. If you are a community member and would like that to happen, contact us (see above).