PlantTFDB
PlantRegMap/PlantTFDB v5.0
Plant Transcription Factor Database
Pipeline for parsing BLAST reciprocal best hits and inferring orthologous groups


Pipeline for parsing BLAST reciprocal best hits and inferring orthologous groups

The protein sequences from genome annotation for 156 species were used to paring BLAST reciprocal best hits (RBHs) and inferred orthologous groups (OGs) as follows:
  1. Selected the longest protein for each locus, filtered out proteins whose length less than 50aa and these putative pseudogenes (with '*' within sequences).
  2. BLAST all against all using representative proteins for each locus of 156 species.
  3. Parsed the RBHs for every species pairs among 156 species based on the BLAST results.
  4. Inferred OGs using OrthoFinder (I=2) for all 156 species, 17 representative species and the clades of chlorophytae, monocots, asterids, fabids and malvids, respectively, based on the BLAST results.
  5. Chose TF OGs with taxa >=2 from OGs of 17 representative species and the clades of Chlorophytae, Monocots, Asterids, Fabids and Malvids to construct multiple sequence alignments and phylogenetic trees.

The TF OGs of 17 representative species and the clades of chlorophytae, monocots, asterids, fabids and malvids can be viewed in TF pages, and the RBHs and constructed OGs for all genes can be download here.