As with motif discovery, TF-binding sites search benefits from a comparative genomics approach. Searching a single genome for TFBS will yield very noisy results. If a number of related genomes are searched, then the search results can be compared and strengthened by requiring that a site be located, for instance, in the promoter region of the same gene for at least two or three species. As in the case of motif discovery, these methods are not often applied to verify experimental results, but can be used to guide experimental research. For instance, comparative genomics searches can be implemented to detect good candidate sites, which are then verified using an experimental technique.
The DNAse foot-printing method starts by focusing on a given region of interest (e.g. a promoter region) and amplifying it by PCR to obtain lots of sample. It then throws in the TF and then the DNAse. The mix is left to stir for a short time and then gel electrophoresis is run to compare the pattern of fragments in a control (no TF) and in the sample. If the TF has bound the sample, it will have protected a stretch of DNA (encompassing some fragments of the control) and thus those fragments will not appear in the sample gel. The fragments can then be cut-out from the gel, purified and sequenced to obtain the sequence of the protected region. This is often used to identify the binding motif of a TF for the first time. The foot-printing will typically resolve the protected region down to 50-100 bp, and the sequence can be then examined for possible TF-binding sites either by eye of using a computer search.
Electro-mobility shift-assays (or gel retardation assays) are a standard way of assessing TF-binding. A fragment of DNA of interest is amplified and labeled with a fluorophore. The fragment is left to incubate in a solution containing abundant TF and non-specific DNA (e.g. randomly cleaved DNA from salmon sperm, of all things) and then a gel is run with the incubated sample and a control (sample that has not been in contact with the TF). If the TF has bound the sample, the complex will migrate more slowly than unbound DNA through the gel, and this retarded band can be used as evidence of binding. The unspecific DNA ensures that the binding is specific to the fragment of interest and that any non-specific DNA-binding proteins left-over in the TF purification will bind there, instead of on the fragment of interest. EMSAs are typically carried out in a bunch of fragments, shown as multiple double (control+experiment) lanes in a wide picture. Certain additional controls are run in at least one of the fragments to ascertain specificity. In the most basic of these, specific competitor (the fragment of interest or a known positive control, unlabelled) is added to the reaction. This should sequester the TF and hence make the retardation band disappear, proving that the binding is indeed specific
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Target-specific mutation, as opposed to non-specific mutation.
In the context of TF-binding sites, site-directed mutagenesis is typically used to establish/confirm the specific sequence and location of a site, often in tandem with EMSA.
Different positions of a putative binding site are mutated to non-consensus (or random) bases and binding to the mutated site is evaluated through EMSA or other means. Often implemented only in conserved motif positions or serially through all positions of a site.
Regulated genes for each binding site are displayed below. Gene regulation diagrams
show binding sites,
both positively and negatively regulated
genes, genes with unspecified type of regulation.
For each indvidual site, experimental techniques used to determine the site are also given.