Introductions

In this package, we provide e-value for four DMR (differentially methylated region) detection tools (MethylKit, Metilene, BiSeq and DMRfinder) and general purpose. For DMR (methylKit, biseq, DMRfinder or metilene), the met-evalue calculation is conducted by the metevalue.[DMR] function.

DMR Method methyrate Example Method.output Example
MethylKit metevalue.methylKit data(demo_methylkit_methyrate) data(demo_methylkit_met_all)
BiSeq metevalue.biseq data(demo_biseq_methyrate) data(demo_biseq_DMR)
DMRfinder metevalue.DMRfinder data(demo_DMRfinder_rate_combine) data(demo_DMRfinder_DMRs)
Metilene metevalue.metilene data(demo_metilene_input) data(demo_metilene_out)

Two routines are supported to calculate the combined e-value:

Details are as follows.

Call by files

We design the metevalue.[DMR] function to accept similar parameter patterns:

metevalue.[DMR](
  methyrate,                # Output file name of [DMR]
  [DMR].output,             # Output file name of [DMR] with e-value of each region
  adjust.methods = "BH",    # Adjust methods of e-value
  sep = "\t",               # seperator, default is the TAB key.
  bheader = FALSE           # A logical value indicating whether the [DMR].output file
                            # contains the names of the variables as its first line
)

Here [DMR] could be one of methylKit, biseq, DMRfinder or metilene.

Call by R data frames

We provide the evalue_buildin_var_fmt_nm and varevalue.metilene function to handle the general DMR e-value calculation in DNA methylation studies:

# Here  `[DMR]` coudle be one of `methylKit`, `biseq`, `DMRfinder` or `metilene`.
method_in_use = "[DMR]"
result = evalue_buildin_var_fmt_nm(
          methyrate,              # Data frame of the methylation rate
          DMR_evalue_output,      # Data frame of output data corresponding to the
                                  # "method" option
          method = method_in_use) # DMR: "metilene", "biseq", "DMRfinder" or "methylKit"
result = list(a = result$a,
              b = result$b,
              a_b = evalue_buildin_sql(result$a, result$b, method = method_in_use))
result = varevalue.metilene(result$a, result$b, result$a_b)

Replace [DMR] to one of methylKit, biseq, DMRfinder or metilene accordingly.

Notice: for different [DMR], the data.frame schemas are different!!! Check the R help document for details. Check the Demo data section for details. # Examples

MethylKit Example

methylKit is a R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing.

Currently, metevalue package supports the e-value analytics of the methylKit output file.

library(metevalue)

####Simulation Data ####
set.seed(1234)

# methylKit is a Bioconductor package
library(methylKit)
file.list=list( system.file("extdata", 
                            "test1.myCpG.txt", package = "methylKit"),
                system.file("extdata",
                            "test2.myCpG.txt", package = "methylKit"),
                system.file("extdata", 
                            "control1.myCpG.txt", package = "methylKit"),
                system.file("extdata", 
                            "control2.myCpG.txt", package = "methylKit") )


# read the files to a methylRawList object: myobj
myobj=methRead(file.list,
               sample.id=list("test1","test2","ctrl1","ctrl2"),
               assembly="hg18",
               treatment=c(1,1,0,0),
               context="CpG"
)

meth=unite(myobj, destrand=FALSE)
meth.C <- getData(meth)[,seq(6,ncol(meth),3)]
meth.T <- getData(meth)[,seq(7,ncol(meth),3)]
mr <- meth.C/(meth.C + meth.T)
chr_pos = getData(meth)[,1:2]
methyrate = data.frame(chr_pos,mr)
names(methyrate) = c('chr', 'pos', rep('g1',2), rep('g2',2))
region<-tileMethylCounts(myobj)
meth<-unite(region,destrand=F)
myDiff<-calculateDiffMeth(meth)
met_all<-getMethylDiff(myDiff,type="all")

example_tempfiles = tempfile(c("rate_combine", "methylKit_DMR_raw"))
tempdir()
write.table(methyrate, file=example_tempfiles[1], row.names=F, col.names=T, quote=F, sep='\t')
write.table (met_all, file=example_tempfiles[2], sep ="\t", row.names =F, col.names =T, quote =F)

evalue.methylKit function could be used to tackle the problem.

result = metevalue.methylKit(example_tempfiles[1], example_tempfiles[2], bheader = T)
str(result)

Alternatively, one could use the build-in functions to derive functions which avoid the file operation:

result = evalue_buildin_var_fmt_nm(methyrate, met_all, method="methylKit")
result = list(a = result$a, 
              b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b, method="methylKit"))
result = varevalue.metilene(result$a, result$b, result$a_b)
str(result)

BiSeq Example

First, we load the methylation data at CpG site levels from ‘BiSeq’ package. Then we clustered CpG sites into DMRs using ‘BiSeq’.

library(BiSeq)
library(dplyr)
data(rrbs)
rrbs.rel <- rawToRel(rrbs)
methyrate <- methLevel(rrbs.rel)
methyrate <- data.frame(methyrate)
methyrateq = cbind(rows = as.numeric(row.names(methyrate)), methyrate)
methypos = data.frame(rows = as.numeric(row.names(methyrate)), rowRanges(rrbs))
methyrate = left_join(methypos, methyrateq)
methyrate = methyrate[,c(2,3,7:16)]
names(methyrate) <- c('chr','pos',rep('g1',5),rep('g2',5))

rrbs.clust.unlim <- clusterSites(object = rrbs,perc.samples = 3/4,min.sites = 20,max.dist = 100)

clusterSitesToGR(rrbs.clust.unlim)
ind.cov <- totalReads(rrbs.clust.unlim) > 0

quant <- quantile(totalReads(rrbs.clust.unlim)[ind.cov])
rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = quant)
predictedMeth <- predictMeth(object = rrbs.clust.lim)

test<- predictedMeth[, colData(predictedMeth)$group == "test"]
control <- predictedMeth[, colData(predictedMeth)$group == "control"]
mean.test <- rowMeans(methLevel(test))
mean.control <- rowMeans(methLevel(control))

betaResults <- betaRegression(formula = ~group,link = "probit",object = predictedMeth,type = "BR")
vario <- makeVariogram(betaResults)
vario.sm <- smoothVariogram(vario, sill = 0.9)

locCor <- estLocCor(vario.sm)
clusters.rej <- testClusters(locCor)
clusters.trimmed <- trimClusters(clusters.rej)
DMRs <- findDMRs(clusters.trimmed,max.dist = 100,diff.dir = TRUE)


example_tempfiles = tempfile(c('rate_combine', 'BiSeq_DMR'))
write.table(methyrate, example_tempfiles[1], row.names=F, col.names=T, quote=F, sep='\t')
write.table(DMRs, example_tempfiles[2], quote=F, row.names = F,col.names = F, sep = '\t')

Finally, we added E-values and adjusted E-values as additional columns to the output file of ‘BiSeq’.BiSeq_evalue function could be used to tackle the problem.

hh <- data.frame(DMRs)
result = evalue_buildin_var_fmt_nm(rate_combine, hh, method="biseq")
result = list(a = result$a,  b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b,method="biseq"))
result = varevalue.metilene(result$a, result$b, result$a_b)
str(result)

DMRfinder Example

Given the input file

  • rate_combine_DMRfinder: a file containing methylation rates at each CpG site

  • DMRfinder_DMR: the output file from ‘DMRfinder’

rate_combine <- read.table("rate_combine_DMRfinder", header = T)
head(rate_combine)

DMRs <- read.table("DMRfinder_DMR", header = T)
head(DMRs)

Adding E-values and adjusted E-values as additional columns to file ‘DMRfinder_DMR’

result <- evalue.DMRfinder('rate_combine_DMRfinder', 'DMRfinder_DMR')
head(result)

Alternatively, function varevalue.metilene can also provides e-value and adjusted e-value.

result = evalue_buildin_var_fmt_nm(rate_combine, DMRs, method="DMRfinder")
result = list(a = result$a, 
              b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b, method="DMRfinder"))
result = varevalue.metilene(result$a, result$b, result$a_b)
head(result)

Metilene Example

Given

  • metilene.input: the input file of Metilene containing methylation rates at each CpG site
  • metilene.out: the output file of Metilene
input <- read.table("metilene.input", header = T)
head(input)

out <- read.table("metilene.out", header = F)
head(out)

Adding E-values and adjusted E-values as additional columns to metilene.out

result <- evalue.metilene('metilene.input', 'metilene.out')
head(result)

Alternatively, function varevalue.metilene can also provides e-value and adjusted e-value.

result = evalue_buildin_var_fmt_nm(input, out, method="metilene")
result = list(a = result$a, 
              b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b, method="metilene"))
result = varevalue.metilene(result$a, result$b, result$a_b)
head(result)

Misc

Demo data

Demo data for different metevalue.[DMR] functions are listed in the section.

Input Data Examples: MethylKit

methyrate Example

chr pos g1 g1 g2 g2
chr21 9853296 0.5882353 0.8048048 0.8888889 0.8632911
chr21 9853326 0.7058824 0.7591463 0.8750000 0.7493404
  • chr: string Chromosome
  • pos: int Position
  • g1~g2: methylation rate data in groups, repeat 8 times. Notice that there are “NaN” within the feature columns.

methylKit.output Example

chr start end strand pvalue qvalue meth.diff
chr21 9927001 9928000 * 0 0 -34.07557
chr21 9944001 9945000 * 0 0 -40.19089
  • chr: Chromosome
  • start: The positions of the start sites of the corresponding region
  • end: The positions of the end sites of the corresponding region
  • strand: Strand
  • pvalue: The adjusted p-value based on BH method in MWU-test
  • qvalue: cutoff for qvalue of differential methylation statistic
  • methyl.diff: The difference between the group means of methylation level

Input Data Examples: BiSeq

methyrate Example

chr pos g1 g1 g1 g1 g1 g2 g2 g2 g2 g2
chr1 870425 0.8205128 1 0.7 NaN NaN 0.3125 0.7419355 0.2461538 0.1794872 0.2413793
chr1 870443 0.8461538 1 0.7 NaN NaN 0.3750 0.3225806 0.2923077 0.0512821 0.2413793
  • chr: Chromosome
  • pos: int Position
  • g1~g2: methylation rate data in groups

biseq.output Example

seqnames start end width strand median.p median.meth.group1 median.meth.group2 median.meth.diff
chr1 872369 872616 248 * 0.0753559 0.9385462 0.8666990 0.0710524
chr1 875227 875470 244 * 0.0000026 0.5136315 0.1991452 0.2942668
  • seqnames: Chromosome
  • start: The positions of the start sites of the corresponding region
  • end: The positions of the end sites of the corresponding region
  • width: The range of the corresponding region
  • strand: Strand
  • median.p: The median of p-values in the corresponding region
  • median.meth.group1 : The median of methylation level for the corresponding segment of group 1
  • median.meth.group2 : The median of methylation level for the corresponding segment of group 2
  • median.meth.diff: The median of the difference between the methylation level

Input Data Examples: DMRfinder

methyrate Example

chr pos g1 g1.1 g2 g2.1
chr1 202833315 0 0.0000000 0 0
chr1 202833323 1 0.8095238 1 1
  • chr: Chromosome
  • pos: int Position
  • g1~g2: methylation rate data in groups

DMRfinder.output Example

chr start end CpG Control.mu Exptl.mu Control.Exptl.diff Control.Exptl.pval
chr8 25164078 25164102 3 0.9241646 0.7803819 -0.1437827 0.0333849
chr21 9437432 9437538 14 0.7216685 0.1215506 -0.6001179 0.0000000
  • chr: Chromosome
  • start: The positions of the start sites of the corresponding region
  • end: The positions of the end sites of the corresponding region
  • CpG: The number of CpG sites within the corresponding region
  • Control.mu: The average methylation rate in control group
  • Expt1.mu: The average methylation rate in experiment group
  • Control.Expt1.diff: The methylation difference between control and experiment groups
  • Control.Expt1.pval: P-value based on Wald-test

Input Data Examples: DMRfinder

methyrate Example

chrom pos g1 g1.1 g1.2 g1.3 g1.4 g1.5 g1.6 g1.7 g2 g2.1 g2.2 g2.3 g2.4 g2.5 g2.6 g2.7
chr21 9437433 0.9285714 NA 0.7222222 0.75 1 0.6666667 1 0.8695652 0.0000000 0 0 0 0.0000000 0.0 NA 0.00
chr21 9437445 1.0000000 NA 0.9444444 0.75 1 0.6666667 0 0.8695652 0.6111111 0 0 0 0.7333333 0.6 NA 0.75
  • chr: Chromosome
  • pos: int Position
  • g1~g2: methylation rate data in groups

metilene.output Example

chr start end q-value methyl.diff CpGs p p2 m1 m2
chr21 9437432 9437540 0 0.610989 26 0 0 0.73705 0.12606
chr21 9708982 9709189 0 0.475630 28 0 0 0.58862 0.11299
  • chr: Chromosome
  • start: The positions of the start sites of the corresponding region
  • end: The positions of the end sites of the corresponding region
  • q-value: The adjusted p-value based on BH method in MWU-test
  • methyl.diff: The difference between the group means of methylation level
  • CpGs: The number of CpG sites within the corresponding region
  • p : p-value based on MWU-test
  • p2: p-value based on 2D KS-test
  • m1: The absolute mean methylation level for the corresponding segment of group 1
  • m2: The absolute mean methylation level for the corresponding segment of group 2