Background

RNA-Seq (RNA sequencing), also called whole transcriptome shotgun sequencing (WTSS), uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA (gene expression) and isoforms variants in a biological sample at a given moment in time.

Figure: RNA-Seq workflow


1. Raw data statistics

Once you provide raw data, then data stats will be provided, including number of reads, genome coverage (x) and base distribution.


2. Quality check

Quality control checks on raw sequence data coming from high throughput sequencing provides a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. FastQC will be used to study the quality of data provided. Usually we will test for adapter contamination, read quality and other sequencing biases.


3. Data pre-processing

Data pre-processing is very important to process over-represented sequence and low quality reads as they may interfere with alignment and eventually with the gene expressions.

Based on the quality of data:

  • 1. Remove the adapters/over-represented sequences from RNA seq data using cutadapt by providing adapters used while sequencing.
  • 2. Quality/end trimming will improve overall quality of each reads; trimmomatic can be used for this step.


4. Alignment

A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. In referenced based RNA-seq, read are aligned to reference genome using TopHat and STAR (Transcripts Aligner).


5. Gene expression analysis

Gene expression analysis studies can provide a snapshot of actively expressed genes under various conditions. DESeq2 and EdgeR are two algorithms are generally used to estimate the gene expression in sample and diff gene expression between samples.


6. Alternative splicing analysis

Alternative splicing quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons or introns across samples. MISO, rMATS, and DEXSeq are commonly used tools study RNA splicing.