Methylation sequencing (also known as bisulphite sequencing) is the use of bisulfite treatment of DNA to determine its pattern of methylation. DNA methylation was the first discovered epigenetic mark, and remains the most studied. In animals it predominantly involves the addition of a methyl group to the carbon-5 position of cytosine residues of the dinucleotide CpG, and is implicated in repression of transcriptional activity.
Genome-wide DNA methylation is mapped with one of the three most commonly used assays, resulting in methylation-specific DNA sequencing or microarray data (CpG methylation array).
Figure: Methylation seq analysis workflow
Once you provide raw data, then data stats will be provided, including number of reads, genome coverage (x) and base distribution.
Quality control checks on raw sequence data coming from high throughput sequencing provides a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. FastQC will be used to study the quality of data provided. Usually we will test for adapter contamination, read quality and other sequencing biases.
Data pre-processing is very important to process over-represented sequence and low quality reads as they may interfere with alignment and eventually with the gene expressions.
Based on the quality of data:
A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
In referenced based Methylation-seq, reads are aligned to reference genome using Bismark.
Contaminating reads will be filtered out to improve the false positive rate. In this step non-CG methylated and duplicate reads will be removed prior to downstream analysis.
DNA methylation is an epigenetic modification known to play a prime role in gene silencing and is an important topic in epigenetic research. Bismark is a program to identify methylated sites from bisulfite/methylation seq.
SeqMonk, is tool which is used to visualize the methylated regions on gene of interest or global methylation level.
Genome Bisulfite Sequencing Analyser (GBSA)/DMAP will be used annotate diff methylated regions.