p3-find-couples

Find Physically Coupled Categories (e.g. Roles, Families)

p3-find-couples.pl [options] catCol

This script performes the general function of finding categories of features that tend to be physically coupled, that is, commonly occurring in close proximity on the contigs. It can be used to find coupled protein families, coupled roles, coupled functional assignments, or any number of things. The input (key) column should contain feature IDs. The category column specified as a parameter should identify the column that contains the feature classification of interest. This could be the feature’s role, its global protein family (or other type of protein family), or anything of importance that groups similar features. The output will display pairs of these categories that tend to occur phyiscally close together on the chromosome. So, for example, if the category column contained roles, this program would output role couples. If the category column contained global protein families, this program would output protein family couples.

A blank value in the category column will cause the input line to be ignored.

The output will be three columns– the two category IDs and the number of times the couple occurred.

Parameters

The positional parameter is the index (1-based) or name of the column containing the category information.

The standard input can be overriddn using the options in Input Options.

Additional command-line options are those given in Column Options (to specify the column containing feature IDs) plus the following.

  • minCount

The minimum number of times a couple must occur to be considered significant. The default is 5.

  • maxGap

The maximum number of base pairs allowed between two features in the same cluster. The default is 2000.

  • location

If the feature location is already present in the input file, the name of the column containing the feature location. The location should be in the form of a start and end with two dots in between, the format used in GenBank and BV-BRC.

  • sequence

If the sequence ID is already present in the input file, the name of the column containing the sequence ID.

  • verbose

If specified, status messages will be written to STDERR.