The bacterial response regulator ArcA uses a diverse binding site architecture to regulate carbon oxidation globally.;Park DM, Akhtar MS, Ansari AZ, Landick R, Kiley PJ;PLoS genetics 2013;
9(10):e1003839
[24146625]
176 ArcA binding regions were mapped across the E. coli K-12 MG1655 genome by performing ChIP-chip and ChIP-seq experiments (109 peaks in common). ArcA binding sites were predicted by bioinformatics approaches. MEME identified a 18-bp sequence motif consisting of two direct repeat (DR) elements. The authors established that many ArcA binding sites contained additional DR elements beyond the two DRs of the ArcA box. Additional DR elements were identified by searching sequences surrounding each ArcA box with a 10 bp weight matrix. DNase I footprinting performed for a set of representative promoters validated bioinformatics predictions of ArcA binding sites. Genome-wide expression profiles for wild type and arcA mutant strains identified 229 differentially expressed operons. A paaA-lacZ reporter fusion assay in the wild type and arcA mutant strains supplemented with phenylacetate showed that repression of paaA was relieved in the arcA mutant.
ChIP assay conditions
enriched ChIP DNA from two additional biological replicates from anaerobic ArcA samples were submitted to the University of Wisconsin-Madison DNA Sequencing Facility for library construction and Illumina sequencing performed as previously described. A total of 1,364,908 and 12,074,358 reads were obtained for the ChIP replicates. Greater than 90% and 80% of these reads, respectively, mapped uniquely to the K12 MG1655 genome (version U00096.2) using the software package SOAP release 2.20, allowing no more than two mismatches
ChIP notes
The CSDeconv algorithm was then used to determine significantly enriched regions in high resolution using both ChIP-seq replicates and two anaerobic input samples from the same sequencing run as the ArcA ChIP samples. Reads that mapped uniquely within the seven rRNA operon regions were eliminated to allow the algorithm to run more efficiently. CSDeconv was run with Matlab v7.11.0 (R2010b) using the following parameters: LLR = 21.75 and alpha = 800 for replicate one and LLR = 22 and alpha = 550 for replicate two. The find_enriched function was modified to account for differences in sequencing depth between the IP and Input samples. Correction factors of 2.98 (replicate 1) and 0.6579 (replicate 2), calculated by dividing the number of unique reads in the Input sample by the number of reads in the ChIP sample for replicates one and two, respectively, were multiplied by nip and the forward and reverse kernel density calculations for both the forward and reverse strands of the ChIP sample.
Regulated genes for each binding site are displayed below. Gene regulation diagrams
show binding sites, positively-regulated genes,
negatively-regulated genes,
both positively and negatively regulated
genes, genes with unspecified type of regulation.
For each indvidual site, experimental techniques used to determine the site are also given.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
The DNAse foot-printing method starts by focusing on a given region of interest (e.g. a promoter region) and amplifying it by PCR to obtain lots of sample. It then throws in the TF and then the DNAse. The mix is left to stir for a short time and then gel electrophoresis is run to compare the pattern of fragments in a control (no TF) and in the sample. If the TF has bound the sample, it will have protected a stretch of DNA (encompassing some fragments of the control) and thus those fragments will not appear in the sample gel. The fragments can then be cut-out from the gel, purified and sequenced to obtain the sequence of the protected region. This is often used to identify the binding motif of a TF for the first time. The foot-printing will typically resolve the protected region down to 50-100 bp, and the sequence can be then examined for possible TF-binding sites either by eye of using a computer search.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.
The principle of ChIP-chip is simple. The first step is to cross-link the protein-DNA complex. This is done using a fixating agent, such as formaldehyde. The cross-linking can later be reversed with heat. Cross-linking kills the cell, giving a snapshot of the bound TF at a given time. The cell is then lysed, the DNA sheared by sonication and the chromatin[2] (TF-DNA complexes) is pulled down using an antibody (i.e. immunoprecipitated). If an antibody for the TF is available, then it is used; otherwise, the TF is tagged with an epitope targeted by commercially available antibodies (the latter option is cheaper, but runs the risk of altering the TF's functionality). Cross-linking is then reversed to free the bound DNA, which is then amplified, labeled with a fluorophore and dumped onto a DNA-array. The scanned array reveals the genomic regions bound by the TF. The resolution is around ~500 bp as a result of the sonication step.
ChIP-chip (and to a lesser degree ChIP-Seq) results are often validated with ChIP-PCR, in which a PCR with specific primers is performed on the pulled-down DNA. As in the case of RNASeq, there are many variations of these main techniques.
DNA-arrays (or DNA-chips or microarrays) are flat slabs of glass, silicon or plastic onto which thousands of multiple short single-stranded (ss) DNA sequences (corresponding to small regions of a genome) have been attached. After performing a mRNA extraction in induced and non-induced cells, the mRNA is again reverse transcribed, but here the reaction is tweaked, so that the emerging cDNA contains nucleotides marked with different fluorophores for controls and experiment. Targets will hybridize by base-pairing with those probes that resemble them the most. The array can then be stimulated by a laser and scanned for fluorescence at two different wavelengths (control and induced). The ratio or log-ratio between the two fluorescence intensities corresponds to the induction level.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector.