Quantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto

Sisira Kadambat Nair

Apr 14, 2025

Quantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto

This protocol is a draft, published without a DOI.

Sisira Kadambat Nair¹

¹University Health Network

Sisira Kadambat Nair

University Health Network

Protocol Citation: Sisira Kadambat Nair 2025. Quantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto. protocols.io https://protocols.io/view/quantifying-transcript-abundance-from-rna-seq-fast-d7x99pr6

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: April 14, 2025

Last Modified: April 14, 2025

Protocol Integer ID: 126689

Keywords: RNA-seq, Kallisto, transcriptomics, pseudoalignment, gene expression, fastq, quantification, tximport, DESeq2, bioinformatics

Abstract

This protocol describes a fast and efficient computational workflow to quantify transcript-level expression from RNA-seq data using the Kallisto tool. Kallisto utilizes pseudoalignment for rapid transcript quantification and is ideal for large datasets. This guide walks through transcriptome index creation, quantification using single- or paired-end FASTQ files, and optional downstream summarization with tximport in R.

Guidelines

Notes
Ensure transcriptome and tx2gene annotations are from the same source (e.g., both Ensembl release 109).
For paired-end data, read pairing must be preserved.
Store all output in well-organized directories (one per sample).

Materials

 
ItemDescription
FASTQ filesRaw RNA-seq reads
Reference transcriptomeFASTA format from Ensembl or GENCODE
Kallistohttps://pachterlab.github.io/kallisto/
R + tximportR environment with Bioconductor packages
Optional: FastQC, MultiQCFor quality control
Computational tools required
 Software & Tools
Unix shell or terminal
Kallisto (v0.46 or later)
R (≥ 4.0) with tximport, readr, tidyverse
(Optional) FastQC, MultiQC

Procedure

Step 1: Install Kallisto

conda install -c bioconda kallisto

Step 2: Build Transcriptome Index

kallisto index -i transcriptome.idx transcripts.fa

transcripts.fa : Reference transcriptome (e.g., Ensembl cDNA)

Step 3: Quantify ExpressionPaired-end:

kallisto quant -i transcriptome.idx -o output_sample1 -b 100 sample1_R1.fastq.gz sample1_R2.fastq.gz

Single-end:
kallisto quant -i transcriptome.idx -o output_sample1 -b 100 --single -l 200 -s 20 sample1.fastq.gz

Step 4: Summarize with tximport (Optional)

library(tximport)
library(readr)

samples <- c("sample1", "sample2")
files <- file.path(samples, "abundance.tsv")names(files) <- samples

tx2gene <- read_csv("tx2gene.csv")
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene)

Prerequisites

Linux/macOS with command-line tools
Kallisto installed
FASTQ files (single-end or paired-end)
Transcriptome reference (FASTA format)
Transcript-to-gene mapping file (for downstream summarization using tximport in R)

From FASTQ to Quant Matrix

→ FASTQ → Kallisto → abundance.tsv → tximport → count matrix → DESeq2

Workflow Overview

Step 1: Prepare Transcriptome Index

kallisto index -i transcriptome.idx transcripts.fa

Step 2: Quantify Expression from FASTQ
For paired-end reads:

kallisto quant -i transcriptome.idx -o output_sample1 -b 100 sample1_R1.fastq.gz sample1_R2.fastq.gz

For single-end reads (with estimated fragment length and standard deviation):

kallisto quant -i transcriptome.idx -o output_sample1 -b 100 --single -l 200 -s 20 sample1

Step 3: Summarize at Gene Level with 
tximport
 (optional, in R)

library(tximport)
library(readr)

files <- c("output_sample1/abundance.tsv", "output_sample2/abundance.tsv")names(files) <- c("sample1", "sample2")

tx2gene <- read_csv("tx2gene.csv")  # A CSV with two columns: transcript_id, gene_id
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene)

Protocol references

https://pmcobe.ca/pipeline/60a4336aaf7a3251ac7e15dd
https://pmcobe.ca/pipeline/677c0075e5f1f034f1276800
https://pmcobe.ca/pipeline/60a4336aaf7a3251ac7e1667
https://pmcobe.ca/pipeline/60a4336aaf7a3251ac7e1647

	Item	Description
	FASTQ files	Raw RNA-seq reads
	Reference transcriptome	FASTA format from Ensembl or GENCODE
	Kallisto	https://pachterlab.github.io/kallisto/
	R + tximport	R environment with Bioconductor packages
	Optional: FastQC, MultiQC	For quality control

Public workspaceQuantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto

Quantifying Transcript Abundance from RNA-seq FASTQ Files Using Kallisto