Sim¶
Similarity Object¶
Introduction¶
The similarity object provides access by name to the fields of a similarity list. Unlike a standard object, the similarity object is stored as a list reference, not a hash reference. The similarity fields are pulled from the appropriate places in the list.
A blast takes a sequence called the query and matches it against a database. When describing the data in a similarity, we will refer repeatedly to the query sequence and the database sequence. Often, the query and database sequences will be given by peg IDs. In some cases, however, they will be contig IDs. In both cases, the match is represented by an alignment between portions of the sequences. Gap characters may be required to get the alignments to match, and the number of gaps is part of the data in the similarity.
new¶
my $sim = Sim->new( @data );
my $sim = Sim->new( \@data );
Create a similarity object from an array of fields.
data
An array of data in fields:
0 id1 query sequence id 1 id2 subject sequence id 2 iden percentage sequence identity 3 ali_ln alignment length 4 mismatches number of mismatch 5 gaps number of gaps 6 b1 query seq match start 7 e1 query seq match end 8 b2 subject seq match start 9 e2 subject seq match end 10 psc match e-value 11 bsc bit score 12 ln1 query sequence length 13 ln2 subject sequence length 14 tool tool used to produce similaritiesThe following fields may vary by tool:
15 loc1 query seq locations string (b1-e1,b2-e2,b3-e3) 16 loc2 subject seq locations string (b1-e1,b2-e2,b3-e3) 17 dist tree distance
RETURN
Returns a similarity object that allows the values to be accessed by name.
new_from_hsp¶
my $sim = Sim->new_from_hsp( @hsp );
my $sim = Sim->new_from_hsp( \@hsp );
my $sim = Sim->new_from_hsp( \@hsp, $tool );
Create a similarity object from a gjoparseblast hsp.
hsp
An array of data on a blast hsp as returned by gjoparseblast::blast_hsp_list() or gjoparseblast::next_blast_hsp().
RETURN
Returns a similarity object that allows the values to be accessed by name.
as_string¶
my $simString = "$sim";
or
my $simString = $sim->as_string;
Return the similarity as a descriptive string, consisting of the query peg, the similar peg, and the match score.
new_from_line¶
my $sim = Sim->new_from_line($line);
Create a similarity object from a blast output line. The line is presumed to have the complete list of similarity values in it, tab-separated.
line
Input line, containing the similarity values in it delimited by tabs. A line terminator may be present at the end.
RETURN
Returns a similarity object that allows the values to be accessed by name.
validate¶
my $okFlag = $sim->validate();
Return TRUE if the similarity values are valid, else FALSE.
as_line¶
my $line = $sim->as_line;
Return the similarity as an output line. This is exactly the reverse of /new_from_line.
id1¶
my $id = $sim->id1;
Return the ID of the query sequence that was blasted against the database.
id2¶
my $id = $sim->id2;
Return the ID of the sequence in the database that matched the query sequence.
iden¶
my $percent = $sim->iden;
Return the percentage identity between the query and database sequences.
ali_ln¶
my $chars = $sim->ali_ln;
Return the length (in characters) of the alignment between the two similar sequences.
mismatches¶
my $count = $sim->mismatches;
Return the number of alignment positions that do not match.
gaps¶
my $count = $sim->gaps;
Return the number of gaps required to align the sequences.
b1¶
my $beginOffset = $sim->b1;
Return the position in the query sequence at which the alignment begins.
e1¶
my $endOffset = $sim->e1;
Return the position in the query sequence at which the alignment ends.
b2¶
my $beginOffset = $sim->b2;
Position in the database sequence at which the alignment begins.
e2¶
my $endOffset = $sim->e2;
Return the position in the database sequence at which the alignment ends.
psc¶
my $score = $sim->psc;
Return the similarity score as a floating-point number. The score is the computed
probability that the similarity is a result of random chance. A score of 0 indicates a
perfect match. A higher score indicates a less-perfect match. Values of 1e-10
or
less are considered good matches.
bsc¶
my $score = $sim->bsc;
Return the bit score for this similarity. The bit score is an estimate of the search space required to find the similarity by chance. A higher bit score indicates a better match.
bsc¶
my $score = $sim->bit_score;
Return the bit score for this similarity. The bit score is an estimate of the search space required to find the similarity by chance. A higher bit score indicates a better match.
nbsc¶
my $score = $sim->nbsc;
Return the normalized bit score for this similarity. This is the bit score divided by the length of the matching sequence regions. It is a better summary of the overall sequence similarity than is the percentage identity. Typically identical sequences have a value close to 2, and it goes down to 0 as the similarity decreases (values less than 0 are possible, but are never significant and hence are never reported in a local similarity search).
ln1¶
my $length = $sim->ln1;
Return the number of characters in the query sequence.
ln2¶
my $length = $sim->ln2;
Return the length of the database sequence.
tool¶
my $name = $sim->tool;
Return the name of the tool used to find this similarity.
usage¶
my $pod_as_text = Module::usage;
my $pod_as_text = Module->usage;
my $pod_as_text = Package->usage;
my $pod_as_text = $object->usage;
Returns the module’s pod documentation as text.