p3-find-in-clusters

Identify Clusters Containing Input Features

p3-find-in-clusters.pl [options] realClusterFile

This script takes as input a list of features and compares it to a list of physical clusters (the output of p3-identify-clusters). If a features is within the bounds of a cluster, or within the gap distance of either end, it will be output with the cluster ID appended.

Parameters

The positional parameter is the name of the file containing the cluster information. This must be a tab-delimited file with the following columns– (1) cluster ID, (2) genome ID, (3) sequence ID, (4) start location, and (5) end location. This is the output format from p3-identify-clusters.

The standard input can be overriddn using the options in Input Options. The standard input should contain feature IDs in the key column (specified using the options in Column Options) plus the feature location and the ID of the containing sequence. The following additional options are supported.

  • maxGap

The maximum number of base pairs allowed between two features in the same cluster. The default is 2000.

  • location

In index (1-based) or name of the column containing the feature location. The default is location.

  • sequence

The index (1-based) or name of the column containing the sequence ID. The defahult is sequence_id.