PlantRegMap/PlantTFDB v5.0
Plant Transcription
Factor Database
|
Home TFext BLAST Prediction Download Help About Links PlantRegMap |
- The flowchart for construction of PlantTFDB
- Data source
- Data source (TFext)
- Pipeline to construct comprehensive protein dataset
- Family assignment rules
- List of studies which use the Family assignment rules
- Thresholds for domain identification
- Summary of TFs in different taxonomic lineages of green plants and the origination stage of TF families
- Pipeline for parsing BLAST reciprocal best hits (RBHs) and inferring orthologous groups
- Pipeline for GO annotation
- Curation and projection of TF binding motifs
- Transcription factor information
- Multiple sequence alignment
- Phylogenetic trees
- Quick search
- TF prediction server
- Help for PlantRegMap
Transcription Factor Information
TF ID
The ID of transcription factor collected in
PlantTFDB. For species with genome annotation, IDs from genome annotation were adopted as the PlantTFDB ID directly. For species without genome annotation, a unique TF ID was assigned for each TF, which consists of three characters which represent the species (e.g. Aan represents
Artemisia annua
) and 6 figures.Taxonomy
The taxonomic ID and lineage for each organism was collected from NCBI Taxonomy.
Common name
Gene Model
The gene (data source) coding for this transcription factor.
Gene Model ID
The ID of gene model, which was extracted from
the original data source. Gene model ID can be searched in advanced search page.
Gene Model Type
The type of gene model. There are three types of
gene model in PlantTFDB:
'genome' -- gene models came from genome
annotation;
'PU_ref' -- gene models came from PlantGDB and
UniGene, and they were selected as a representation of a cluster of PUTs
and Unigene;
'PU_unref' -- gene models came from PlantGDB
and UniGene, but they were not selected as a representation of a cluster
of PUTs and Unigene;
Source
The source where gene model was got
Signature Domain
The Domain used to identify and classify
transcription factors.
Protein Features
Domain and other features identified by
InterProScan v5.
Plant Ontology
Plant Ontology (PO) was downloaded from TAIR10 for
A. thaliana
and Plant Ontology Consortium for other species.Nucleic Localization Signal
Nucleic Localization signal (NLS) predicted by
predictnls.
3D Structure
The best Blast hit from PDB.
Expression
The express description (tissue specificity and developmental stage) was collected from UniProt. The best Blast hit from UniGene, GEO, Genevisible and the direct links to Expression Atlas, AtGenExpress and ATTED-II were added.
Function description
Regulation
Manually curated regulations are collected from ATRM.
Interaction
Protein-promoter and protein-protein interaction data were collected from BioGRID, IntAct, and BIND.
Phenotype
Annotation
Link Out
Publications
Publications related to the corresponding TF were collected from Entrez gene, GeneRIF, UniProt and ATRM.
Multiple Sequence Alignment
Protein alignment
Multiple sequence alignment for full length
transcription factors was inferred using T-Coffee(v9.03).
Domain alignment
Multiple sequence alignment for domain was
constructed through Hidden Markov Model-guided method.
Phylogenetic Trees
Phylogenetic trees for TFs within a family intra-species and within the same orthologous group are inferred using MrBayes (v3.2.6) based on the WAG model for 50,000 generations, and the result tree is an unrooted tree.
Phylogenetic trees for TFs of a family from all
species are inferred using FastTree (v2.1.9) based on the WAG model with 100 times bootstraps,
and the result tree is an unrooted tree.
Quick Search
In quick search box, you can search the TF
using TF ID or common name.
TF Prediction Server
A TF prediction server has been upgraded in this version. The family assignment rules and thresholds determined by established methods (see details in the supplemental materials) are used to identify transcrption factors in the input sequences. When users input nucleic acid sequences, ESTScan 3.0 is employed to identify CDS regions of input nucleic acid sequences and translate them to protein sequences. When GC content of input sequences is less than 48%, the ESTScan model trained from the mRNA of
Arabidopsis thaliana
will be used. Otherwise, the model trained from Oryza sativa
will be used. By checking "Best hit in Arabidopsis thaliana
", links to the best hits in Arabidopsis thaliana
will be added in the result for predicted transcription factors. Users can access it here to identify TFs in multiple sequences.