p3-genome-distance¶
Compute the Distance Between a Specified Genome and Genomes in a File¶
p3-genome-distance.pl [options] baseGenome
This script uses protein families to compute genome distance. A single genome is chosen as the base genome. We run through all the protein families and compute the kmers in common for each family member in the other genomes. The similarity score is the total number of kmers in common. The higher the similarity score, the closer the genome is to the base. The number produced is not scaled, so it can only be used for relative comparison.
Parameters¶
The positional parameter is the ID of the base genome.
The standard input should contain the IDs of the genomes to compare. The standard input can be overridden using the options in Input Options and the options in Column Options can be used to specify the column containing the genome IDs. If the base genome is found in the input, it will be ignored.
The standard output will contain the genome IDs and their distances. The following additional options are supported.
dna
Use DNA kmers instead of protein kmers.
kmer
The kmer size to use. The default is
8
for proteins and16
for DNA.
verbose
If specified, progress messages will be displayed on the standard error output.