ChIP-Seq is equivalent to ChIP-chip down to the last step. In ChIP-Seq, immunoprecipiated DNA fragments are prepared for sequencing and funneled into a massively parallel sequencer that produces short reads. Even though the sonication step is the same as in ChIP-chip, ChIP-Seq will generate multiple short-reads within any given 500 bp region, thereby pinning down the location of TFBS to within 50-100 bp. A similar result can be obtained with ChIP-chip using high-density tiling-arrays. The downside of ChIP-Seq is that sensitivity is proportional to cost, as sensitivity increases with the number of (expensive) parallel sequencing runs. To control for biases, ChIP-seq experiments often use the "input" as a control. This is DNA sequence resulting from the same pipeline as the ChIP-seq experiment, but omitting the immunoprecipitation step. It therefore should have the same accessibility and sequencing biases as the experiment data.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Regulated genes for each binding site are displayed below. Gene regulation diagrams
show binding sites,
both positively and negatively regulated
genes, genes with unspecified type of regulation.
For each indvidual site, experimental techniques used to determine the site are also given.