PlantRegMap/PlantTFDB v5.0
Plant Transcription
Factor Database
|
Home TFext BLAST Prediction Download Help About Links PlantRegMap |
- The flowchart for construction of PlantTFDB
- Data source
- Data source (TFext)
- Pipeline to construct comprehensive protein dataset
- Family assignment rules
- List of studies which use the Family assignment rules
- Thresholds for domain identification
- Summary of TFs in different taxonomic lineages of green plants and the origination stage of TF families
- Pipeline for parsing BLAST reciprocal best hits (RBHs) and inferring orthologous groups
- Pipeline for GO annotation
- Curation and projection of TF binding motifs
- Transcription factor information
- Multiple sequence alignment
- Phylogenetic trees
- Quick search
- TF prediction server
- Help for PlantRegMap
Pipeline to construct comprehensive protein dataset
Species with genome annotation
From version 3.0, we did not construct a protein dataset for species whose genome annotation were available any more. For these species, protein sequences from genome annotation were used after filtering out putative pseudogenes (those have * within protein sequences)
Species without genome annotation
For species whose genome sequences were not
available, EST-based data from PlantGDB and UniGene were used as the
main sources to construct protein data set (see datasource). Following steps were used to
get a non-redundant protein data set:
- Identifying coding sequence (CDS) and corresponding peptide sequence by ESTScan with CDS length>=150 and score >=200.
- Filtering out those proteins whose 'x' content is greater than 0.05.
- Clustering proteins by blastclust (identity >= 0.95 and coverage >= 0.9), and the resulted protein set is called PUset.