p3-mass-cluster-run¶
Mass Cluster Run¶
p3-mass-cluster-run.pl [options] workDir outDir
This is a special-purpose script that processes the genus-species file from p3-genus-species as input and creates signature cluster information for each species listed.
Parameters¶
The positional parameters are the name of the working directory and the name of the output directory. The working directory will
contain temporary files built by the clustering script. The output directory will contain a file called genus.
species.clusters.html
containing the cluster data for the specified genus and species.
The standard input can be overridden using the options in Input Options. The first column must contain genus names and the second species names.
The additional command-line options are as follows.
size
The size of each sample set when computing clusters. The default is
20
.
iterations
The number of sampling iterations to run. The default is
4
.
min
The minimum portion of occurrences for a protein family to be considered significant. The default is
0.8
.
max
The maximum portion of occurrences for a protein family to be considered insignificant. The default is
0.1
.
resume
If specified, genus-species combinations will be skipped if output files already exist.