PlantTFDB - Plant Transcription Factor Database @ CBI, PKU

Plant Transcription Factor Database

The flowchart for construction of PlantTFDB
Data source
Data source (TFext)
Pipeline to construct comprehensive protein dataset
Family assignment rules
List of studies which use the Family assignment rules
Thresholds for domain identification
Summary of TFs in different taxonomic lineages of green plants and the origination stage of TF families
Pipeline for parsing BLAST reciprocal best hits (RBHs) and inferring orthologous groups
Pipeline for GO annotation
Curation and projection of TF binding motifs
Transcription factor information
Multiple sequence alignment
Phylogenetic trees
Quick search
TF prediction server
Help for PlantRegMap

Pipeline to construct comprehensive protein dataset

Species with genome annotation

From version 3.0, we did not construct a protein dataset for species whose genome annotation were available any more. For these species, protein sequences from genome annotation were used after filtering out putative pseudogenes (those have * within protein sequences)

Species without genome annotation

For species whose genome sequences were not available, EST-based data from PlantGDB and UniGene were used as the main sources to construct protein data set (see datasource). Following steps were used to get a non-redundant protein data set:

Identifying coding sequence (CDS) and corresponding peptide sequence by ESTScan with CDS length>=150 and score >=200.
Filtering out those proteins whose 'x' content is greater than 0.05.
Clustering proteins by blastclust (identity >= 0.95 and coverage >= 0.9), and the resulted protein set is called PUset.

Pipeline for species with genome sequence