CRISPR-based systems are popular and widely used for genome editing in the field of molecular biology. CRISPR endonuclease Cas9 introduces a DSB into the genomic DNA with high precision. Due to the error-prone repair mechanisms of the cell, this often results in insertions or deletions at the targeted site [1]. This is exploited to make functional knock-outs of specific genes and regulatory elements [2, 3, 4]. Alternatively, to gain more control over the nature of the mutations, strategies have been developed that introduce small nucleotide changes around a precisely targeted site by using a donor template [5, 6]. In the latter approach the genomic DNA around the DSB break is replaced by the DNA of the donor template through homology-directed repair (HDR ), resulting in the introduction of a designed mutation with high accuracy [7, 8]. This precise editing creates the possibility to generate and study specific disease-causing nucleotide variants [6, 9]. Typically, one starts with a homogeneous cell line and ends up with a pool of cells with a complex mix of indels and/or designer mutations [10, 11, 12]. To study a mutation of interest, clonal mutant lines need to be isolated from the cell pool. Because this is a very labor-intensive process it is important to know a priori the efficiency in which the desired mutation(s) have been introduced. However, a complicating factor is that the efficacy of the programmable nucleases can vary dramatically depending on the sequence that is targeted. In addition, different cell types have a varying performance in transfection capability. These factors make the efficacy of CRISPR experiment difficult to predict. For this reason it is usually necessary to test several guide RNAs (gRNAs) that lead the endonuclease to the site of interest. This is even more critical when a template-directed strategy is applied, which often has a low efficiency because HDR repair pathways are generally less active than error-prone non-templated repair [10, 12]. Hence, a quick and easy assay to estimate the frequencies of the diverse introduced mutations in the cell pool is of key importance.
Primer design recommendations for control and experimental sample. Primers a, b need to cover the CRISPR target site. The length of the PCR product can vary, but there should be at least >50 bp up- and downstream of the break site for the alignment (see Notes 6 and 7) and decomposition windows respectively (see Note 9).
Primer design recommendations for reference sample. Primer c, d should carry the designed mutation(s) as present in the donor template (see Section 3.2, Note 3). It is advised to include at least 10 complementary nucleotides on the 3′ side of the mutation(s). Donor plasmid contamination in isolated genomic DNA. Potentially, a donor template that was transfected into the cells could co-purify with genomic DNA and be co-amplified in the PCR if it contains the primer sequences. This could result in an overestimation of the HDR events. This is generally not a problem with short ssODN donors, but with plasmid templates with long homology arms the primers a, b should be chosen outside of these homology arms. Alternatively, the donor plasmid may be cleared from the cells by a few passages of culturing.
Nuclease type. TIDE(R) is currently designed for regular Cas9. But it can be used to analyze data from another nuclease, by entering in the web tool the DNA sequence around the expected cut site. The TIDE(R) web tool assumes that the DSB is induced between nucleotides 17 and 18 of the guide RNA sequence string (Fig. 3f). Note that if the exact breakpoint is unknown, TIDE will estimate the amount of the indels correctly, but the nucleotide composition of the +1 insertion will not be reliable. TIDER will only work when the exact cutting position is known and when the nuclease is a blunt cutter.
No guide RNA match. Sometimes a mismatch occurs in the control sequence at the location of the sgRNA. This will stop the TIDE(R) analysis. In this case, edit the base annotation in the chromatogram file into IUPAC nucleotides of the expected control sequence (Fig. 3g). The peak signals in the chromatogram should not be altered. Viewing and editing of chromatogram files can be performed with Snapgene or Chromas software.
Alignment cannot be performed. By default, the alignment window begins at nucleotide number 100, because the first part of the sequence read tends to be of low quality. The end of the alignment window is set automatically at 15 bp upstream of the break site. When this window is too small or when the break site is located upstream of nucleotide 100, the alignment cannot be performed correctly. Then the start of the alignment window should be set manually closer to nucleotide number 1 (Fig. 3c).
Incorrect alignment. When the beginning of the sequence trace is of poor quality, the alignment function can make a mistake. This results in a quality plot with a high aberrant sequence signal along the whole length of the sequence trace (Fig. 3d). The aberrant sequence signal should only increase around the expected cut site (blue dashed line). In case of poor alignment, the start of the alignment window needs to be adjusted until a proper alignment is achieved (default of 100).
Quality plot recommendations. In the experimental sample, around the break site a consistently elevated signal is expected, which is due to indels introduced at the break site. The starting position of this elevated signal may be used to verify that breaks were induced at the expected location. The control trace should have a low and equally distributed aberrant sequence signal along the whole trace. The reference trace in the case of TIDER should only have high scores at the positions of the altered nucleotides. Fluctuations in the control and reference signal reflect local variation in the quality of the sequence trace. Near the end of the sequencing traces the aberrant signal is often high, typically due to the lower quality of the trace toward the end (Fig. 3a). When a sequence stretch of poor local quality is present in the decomposition window the calculations of TIDE(R) are compromised. The boundaries of the decomposition window can be manually adjusted to remove the region that is of low quality; this will improve the estimations. Another area to avoid in the decomposition window is a stretch of repetitive sequences. These regions can be recognized in the quality plot as a sudden stretch without aberrant nucleotides (Fig. 3b). Such region might confound the decomposition of the sequence trace.
Decomposition window recommendations. For TIDE, the default decomposition window spans the entire sequence trace from the break site until the end of the sequence minus the size of the maximum indel. When the boundaries of the decomposition window cannot fulfill this constraint, the software will report that the boundaries are not acceptable. For example, this can occur when the break site is too close to the end of sequence trace. To address this, the decomposition window boundaries should be set further apart or a smaller indel size should be chosen. Alternatively, new primers have to be designed according to Note 1. For TIDER the decomposition window is by default 20 bp upstream of the break to 80 bp downstream from the break. This smaller window compared to TIDE has more discriminatory power for subtle designed base pair changes.
Goodness of fit. R2 is a measure for the reliability of the estimated values. For example, if the R2 value is 0.95, it means that 95% of the variance can be explained by the model; the remainder 5% consists of random noise, very large indels, non-templated point mutations, and possibly more complex mutations. Decomposition results with a low R2 must be interpreted with caution. A low R2can be caused when the settings are not optimal or when the sequence quality is not good (see Note 15). A low R2 value can also arise when a sequence stretch with a poor local quality is present in the decomposition window (see Note 8). Furthermore, the presence of indels larger than the maximum indel size that is considered can affect the R2 (default of 10). By default these are not modeled, which may result in a low R2 score. The size range of indels that are modeled can be manually changed to larger number to test if this improves the fit (Fig. 3e).
Allele-specific indels. The different bars in the plot represent the insertions, deletions, and/or template-directed mutations in the cell population. These mutations are not specific of an allele. To determine allele-specific information a cell clone needs to be isolated and analyzed again by TIDE(R). A diploid cell gives a percentage of ~50% per mutation.
Overall efficiency. The overall efficiency refers to the estimated total fraction of DNA with mutations around the break site. It is calculated as R2 × 100% wild type.
Distal designed mutations. It has been reported that the incorporation of donor template sequence is less efficient when the designed point mutations are further away from the break site [19]. This often leads to a variation in incorporation frequently of the distal and proximal designed mutations as can be observed in the quality plots. Such a situation may confound TIDER estimates. The decomposition window can be restricted to either the proximal or the distal mutations to resolve the individual efficiencies.
Natural versus designed mutations. In general, TIDER is able to discriminate “naturally” occurring deletions and insertions from templated “designed” indels. Only in the presence of a small designed deletion (−1, −2) near the expected break site the designed mutation may be underestimated [14]. In case the designed mutation consists of an insertion larger than +1, TIDER does not consider natural insertions of the same size, because the decomposition becomes less robust. This is generally acceptable, because natural insertions larger than +1 are rarely observed [13, 17].
Poor sequence quality. When the sequence has poor quality overall, TIDE(R) will yield poor results with a low R2 value (see Note 10) since too much noise is present in the data. The quality plot will show an overall high aberrant sequence signal in the control (the reference) and the experimental sample, before and after the break site (see Note 8). It is recommended to check the chromatograms of the samples (Fig. 3h) for poor sequencing quality. If so, these samples cannot be analyzed reliably by TIDE(R). Note that sometimes the peak signals in the chromatogram appear normal, but the file can contain wrongly unannotated or additional annotated nucleotides (Fig. 3i). TIDE(R) gives a warning when the spacing between the nucleotides in the chromatogram of the sequence trace is not consistent, which is often an indication for wrongly unannotated or additional annotated nucleotides. In case of this warning, the chromatograms should be carefully investigated (use Snapgene or Chromas software).
We thank Marcel de Haas, Stefano Manzo, and Ruben Schep for critical reading of the manuscript. This work was supported by the Netherlands Organization for Scientific Research ZonMW-TOP grant 91211061, and European Research Council Advanced Grant 694466.