Apr 14, 2025

Public workspaceQuantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto

This protocol is a draft, published without a DOI.
  • 1University Health Network
Icon indicating open access to content
QR code linking to this content
Protocol CitationSisira Kadambat Nair 2025. Quantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto. protocols.io https://protocols.io/view/quantifying-transcript-abundance-from-rna-seq-fast-d7x99pr6
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: April 14, 2025
Last Modified: April 14, 2025
Protocol Integer ID: 126689
Keywords: RNA-seq, Kallisto, transcriptomics, pseudoalignment, gene expression, fastq, quantification, tximport, DESeq2, bioinformatics
Abstract
This protocol describes a fast and efficient computational workflow to quantify transcript-level expression from RNA-seq data using the Kallisto tool. Kallisto utilizes pseudoalignment for rapid transcript quantification and is ideal for large datasets. This guide walks through transcriptome index creation, quantification using single- or paired-end FASTQ files, and optional downstream summarization with tximport in R.
Guidelines
Notes
  • Ensure transcriptome and tx2gene annotations are from the same source (e.g., both Ensembl release 109).
  • For paired-end data, read pairing must be preserved.
  • Store all output in well-organized directories (one per sample).
Materials
ItemDescription
FASTQ filesRaw RNA-seq reads
Reference transcriptomeFASTA format from Ensembl or GENCODE
Kallistohttps://pachterlab.github.io/kallisto/
R + tximportR environment with Bioconductor packages
Optional: FastQC, MultiQCFor quality control
Computational tools required
Software & Tools
  • Unix shell or terminal
  • Kallisto (v0.46 or later)
  • R (≥ 4.0) with tximport, readr, tidyverse
  • (Optional) FastQC, MultiQC

Procedure

Step 1: Install Kallisto
conda install -c bioconda kallisto

Step 2: Build Transcriptome Index
kallisto index -i transcriptome.idx transcripts.fa

transcripts.fa : Reference transcriptome (e.g., Ensembl cDNA)

Step 3: Quantify ExpressionPaired-end:
kallisto quant -i transcriptome.idx -o output_sample1 -b 100 sample1_R1.fastq.gz sample1_R2.fastq.gz

Single-end:
kallisto quant -i transcriptome.idx -o output_sample1 -b 100 --single -l 200 -s 20 sample1.fastq.gz

Step 4: Summarize with tximport (Optional)
library(tximport)
library(readr)

samples <- c("sample1", "sample2")
files <- file.path(samples, "abundance.tsv")names(files) <- samples

tx2gene <- read_csv("tx2gene.csv")
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene)


Prerequisites
Prerequisites
  • Linux/macOS with command-line tools
  • Kallisto installed
  • FASTQ files (single-end or paired-end)
  • Transcriptome reference (FASTA format)
  • Transcript-to-gene mapping file (for downstream summarization using tximport in R)
From FASTQ to Quant Matrix
→ FASTQ → Kallisto → abundance.tsv → tximport → count matrix → DESeq2
Workflow Overview
Workflow Overview
Step 1: Prepare Transcriptome Index


kallisto index -i transcriptome.idx transcripts.fa

Step 2: Quantify Expression from FASTQ
For paired-end reads:

kallisto quant -i transcriptome.idx -o output_sample1 -b 100 sample1_R1.fastq.gz sample1_R2.fastq.gz

For single-end reads (with estimated fragment length and standard deviation):
kallisto quant -i transcriptome.idx -o output_sample1 -b 100 --single -l 200 -s 20 sample1

Step 3: Summarize at Gene Level with
tximport
(optional, in R)
library(tximport)
library(readr)

files <- c("output_sample1/abundance.tsv", "output_sample2/abundance.tsv")names(files) <- c("sample1", "sample2")

tx2gene <- read_csv("tx2gene.csv") # A CSV with two columns: transcript_id, gene_id
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene)