I have developed two software tools for handling NGS bisulphite-DNA data. The first, “stop_gap”, is a tool to realign bisulphite reads to correct for misalignments, particularly those arising from Ion Torrent sequencing. The second, “measure”, is a generalised alignment walker for any NGS platform that counts the number of each nucleotide, insertions and deletions at genome positions. The software has features that make it suitable for estimation of methylation in bisulphite-DNA reads.
“Stop_gap” allows Ion Torrent (and 454) instruments to be viable alternatives for bisulphite-DNA sequencing. Ion Torrent and 454 sequencing have an error mode where the number of nucleotides in longer homopolymers are often incorrectly estimated. Bisulphite treated DNA has long runs of thymines and the resulting Ion Torrent read errors introduce misalignments - making the estimation of cytosine methylation particularly difficult. “Stop_gap” reads BAM files, implements approaches to correct for such misalignments and writes corrected BAM files. Using an Ion Torrent instrument for NGS has some appeal as the machines are commonplace and offer competitive rates of throughput, long read lengths and relatively low costs.
“Measure” is software that can walk through a BAM file produced by any deep sequencing platform and count instances of each nucleotide, insertions and deletions at CpG sites (or alternatively at every genome position, or matching a user-specified regular expression if one desires). The software will also calculate methylation rates at CpG or CpN sites. Output can be either a csv or Excel file.
The software is written in Python and uses the Pysam wrapper library for samtools. Both software tools can be executed from the command line as part of a pipeline, or alternatively are callable as Python classes.