evalign.pl.html


Sun Oct 12 06:59:33 BST 1997 Created by /gn0/jong/Perl/headbox2html.pl

evalign

Download evalign.pl
Argument   : Two files of sequence alignment. The first one should be COMPUTER aligned
            and the second one is the CORRECT (i.e., structural) alignment.
Author     : A Biomatic
Example    : evalign.pl aa.msf aa.jp -ss -H -E -p

Function   : When you align any sequences by computer algorithms, you want
            to know whether they are correctly aligned in terms of structures.
            If the sequences are from already known structures, you can compare
            and align structural sequences which can be said 'biologically correct'.
            This program, 'evalign.pl' is for comparing the two sets of sequences
            aligned, by calculating the absolute position differences between the
            correct and computer aligned one. This is aware of gap intertions and
            correct alignment made after wrong alignment segment is counted as correct.
            It accepts two sequence files at prompt to calculate the differences
            of positions of the sequences in the input files. The input sequences
            should be identical in both files.
            As an option, this also displays Percentage IDentity.
Options    : seg is for showing the accuracy of alignment on secondary str. blocks.
            ss  is for showing DSSP secondary structure assignment in output.
            H   is for showing HELIX DSSP secondary structure assignment in output.
            E   is for showing Beta-strand DSSP secondary structure assignment in out.
            s   is for sorted final output.
            p   is for displaying conventional percent ID.
            h   is for displaying help
            ns  is for $no_simplify by -ns, ns, Ns, NS, -Ns # seq names are sorted in final output
            t=  is for convert to num of 1 or 0 threshold.
            c   is for convert to num of 1 or 0, default threshold '1' is used
            N   for DO NOT Normalize the error rate which can be more than 1 digit

 $NO_normalize      = 1  by  N -N
 $segment_rate      = 1  by  -seg, seg, Seg # Shows secondary str. block PSR
 $show_percent_id   = 1  by  -p, -P, p, P,  # Shows conventional percent ID.
 $show_sec_str      = 1  by  -ss, ss or SS  # Show Secondary Structure -ss option
 $HELIX_only        = H  by  -H, H          # Shows conventional percent ID.
 $BETA_only         = E  by  -E, E          # Shows conventional percent ID.
 $print_sort        = s  by  -s, s or S     # seq names are sorted in final output
 $interlaced        = i  by  -i, i or I     # seq names are sorted in final output
 $no_simplify       = 1  by  -ns, ns, Ns, NS, -Ns # seq names are sorted in final output
 $threshold         =    by  t=    # seq names are sorted in final output
 $convert_to_0_or_1 = 1  by  -c, C, c, Con  # seq names are sorted in final output
 $HELP              = 1  by  -h, h          # for showing help

Package    : Part of Bioperl project.
Returns    : simple shifted positions.
Usage      : "evalign.pl any_seq_file.msf any_struc_file.jp ["  while any_seq_file.msf
            is a computer aligned output and any_struc_file.jp is a any seq file
            from known structures. (eg,  evalign.pl  aa.msf aa.jp )
Version    : 1.4

get_segment_shift_rate

Download get_segment_shift_rate .pl
Author     : A Biomatic
Example    :  First block is for the first hash input
                            and Second is for the second hash input.

           1cdg_6taa      00000442222222222242222222222777700000007000000000
           1cdg_2aaa      00000442222222222242222222222777700000007000000000
           2aaa_6taa      00000000000000000000000000000000000000000000000000

           1cdg_6taa      -------EEE-----------EE--EEEE------EE---------EEE-
           1cdg_2aaa      -------EEE-----------EE--EEEE------EE---------EEE-
           2aaa_6taa      -------EEEEE------EE-EEEEEEEE----EEEE-------EEEEE-

            
           2aaa_6taa      -------00000---------00000000----0000-------00000-
           1cdg_6taa      -------442---------------2222-----------------000-
           1cdg_2aaa      -------222---------------2222-----------------000-

            
           2aaa_6taa      0%
           1cdg_6taa      67%
           1cdg_2aaa      67%

Function   : calculates the secondary structure segment shift rate.
Keywords   : later sub of get_position_shift_rate for secondary structure regions
            get positon shift rate for secondary structure regions.
Options    : 'p' or 'P' for percentage term(default)
: 'r' or 'R' for ratio term (0.0 - 1.0), where 1 means all the
             segments were wrongly aligned.
: 's' or 'S' for Shift rate (it actually caculates the position shift
             rate for the secondary structure segment.
: 'h' or 'H' for position Shift rate (it actually caculates the position
             shift rate for helical segments). If this is the only option, it
             will show the default percentage term rate for helical segments.
             If used with 'r', it will give you ratio (0.0 - 1.0) for helical
             segment. If used with 's' option, it will give you position shift
             rate for only helical segments.
: 'e' or 'E' for position Shift rate (it actually caculates the position
             shift rate for beta segments). If this is the only option, it will
             show the default percentage term rate for beta segments. If used
             with 'r', it will give you ratio (0.0 - 1.0) for beta. If used
             with 's' option, it will give you position shift rate for only
             beta segments.
Usage      : &get_segment_shift_rate(\%hash_for_errors, \%hash_for_sec_str);
Version    : 1.0

overlay_seq_by_certain_chars

Download overlay_seq_by_certain_chars .pl
Argument   : 2 ref for hash of identical keys and value length.
Author     : A Biomatic
Example    : %out =%{&overlay_seq_by_certain_chars(\%hash1, \%hash2, 'E')};
             output> with 'E' option >>> "name1     --HHH--1232-"
Function   : (name1 000000112324)+(name1  ABC..AD..EFDK ) => (name1 000..00..12324)
             (name2 000000112324)+(name2  --HHH--EEEE-- ) => (name1 ---000--1123--)
             uses the second hash a template for the first sequences. gap_char is
             '-' or '.' or any given char or symbol.
             To insert gaps rather than overlap, use insert_gaps_in_seq_hash
Keywords   : Overlap, superpose hash, overlay, superpose_seq_hash
Options    : E for replacing All 'E' occurrances in ---EEEE--HHHH----, etc.
             : H for replacing all 'H'  "     " "
Package    : Array_Util
Returns    : one hash ref.
Usage      : %out =%{&overlay_seq_by_certain_chars(\%hash1, \%hash2, 'HE')};
Version    : 1.0
Warning    : If gap_chr ('H',,,) is not given, it replaces all the
             non-gap chars (normal alphabet), ie,
             it becomes 'superpose_seq_hash'

read_dir_names_only

Download read_dir_names_only .pl
Argument   : takes one or more scaler references. ('.', \$path, $path, ... )
Author     : A Biomatic
Function   : read any dir names and and then put in array.
Returns    : one ref. of array.
Usage      : @all_dirs_list = @{&read_dir_names_only(\$absolute_path_dir_name, ....)};
Version    : 3.1
Warning    : This does not report '.', '..'
             Only file names are reported. Compare with &read_any_dir

overlay_seq_for_identical_chars

Download overlay_seq_for_identical_chars .pl
Argument   : 2 ref for hash of identical keys and value length. One optional arg for
             replacing space char to the given one.
Author     : A Biomatic
Example    : %out =%{&overlay_seq_for_identical_chars(\%hash1, \%hash2, '-')};
             output> with 'E' option >>> "name1     --HHH--1232-"
Function   : (name1         --EHH--HHEE-- )
             (name2         --HHH--EEEE-- ) ==> result is;

             (name1_name2   -- HH--  EE-- )
             to get the identical chars in hash strings of sequences.

Keywords   : Overlap, superpose hash, overlay identical chars, superpose_seq_hash
Package    : Array_Util
Returns    : one hash ref. of the combined key name (i.e., name1_name2). Combined by '_'
Usage      : %out =%{&overlay_seq_for_identical_chars(\%hash1, \%hash2, '-')};
Version    : 1.0
Warning    : Works only for 2 sequence hashes.

normalize_numbers

Download normalize_numbers .pl
Argument   : (\%hash1, %hash2, \%hash3, ....)
Author     : A Biomatic
Example    : intputhash>                   Outputhash>
             ( '1-2', '12,.,1,2,3,4',     ( '1-2',   '9,.,0,1,2,3',
              '2-3', '12,.,1,5,3,4',       '2-3',   '9,.,0,4,2,3',
              '4-3', '12,3,1,2,3,4',       '3-1',   '9,3,.,.,2,3',
              '3-1', '12,4,.,.,3,4' );     '4-3',   '9,2,0,1,2,3' );
Function   : with given numbers in hashes, it makes a scale of 0-9 and puts
             all the elements in the scale. Also returns the average of the numbs.
Returns    : (\%norm_hash1, \%norm_hash2, \%norm_hash3,.... )

Usage      : %output=%{&normalize_numbers(\%hash1)};
             originally made to normalize the result of get_posi_rates_hash_out
             in   'scan_compos_and_seqid.pl'
Used in    : evalign.pl
Version    : 1.0

hash_stat_for_all

Download hash_stat_for_all .pl
Author     : A Biomatic
Example    : %in =(1, "13242442", 2, "92479270", 3, "2472937439");
             %in2=(1, "28472", 2, "23423240", 3, "123412342423439");

             %in =(name1, "1,3,2,4,2,4,4,2", name2, "9,2,4,7,9,2,7,0");

Function   : gets the min, max, av, sum for the whole values of ALL the
             hashes put in. (grand statistics)
Returns    : normal array of ($min, $max, $sum, $av)
             Example  out:>                 |  min max sum  av
                            -----------------------------------
                            of the whole    |   0   9  110   6
Usage      : %out=%{&hash_average(\%in, \%in2,..)};
Used in    : normalize_numbers
Version    : 1.0

tidy_secondary_structure_segments

Download tidy_secondary_structure_segments .pl
Argument   : hashes and [options]. No options result in default of 'H3', 'E3'
Author     : A Biomatic
Example    : print_seq_in_block(&tidy_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');
             

             1cdg_2aaa      -------EEE-----------EE--EEEE------EE---------EEE-
             1cdg_6taa      -------EEE-----------EE--EEEE------EE---------EEE-
             2aaa_6taa      -------EEEEE------EE-EEEEEEEE----EEEE-------EEEEE-

             

             1cdg_6taa      -------------------------EEEE---------------------
             1cdg_2aaa      -------------------------EEEE---------------------
             2aaa_6taa      -------EEEEE---------EEEEEEEE----EEEE-------EEEEE-

Function   : receives any secondary structure assignment hashes and
             tidys up them. That is removes very shoft secondary structure
             regions like( --HH--, -E-, -EE- ) according to the given minimum
             lengths(threshold) of segments by you.
Options    : something like 'H3' or 'E3' for minimum segment length set to 3 positions.
Package    : Bio::Seq
Returns    : array of references of hashes.
Usage      : print_seq_in_block(&tidy_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');

Version    : 1.0.0

superpose_seq_hash

Download superpose_seq_hash .pl
Argument   : 2 refs. for hash of identical keys and value length and gap_chr.
Author     : A Biomatic
Function   : (name1 000000112324)+(name1  ABC..AD..EFD ) => (name1 000..01..324)
             uses the second hash a template for the first sequences. gap_char is
             '-' or '.'
             To insert gaps rather than overlap, use insert_gaps_in_seq_hash
Keywords   : overlay sequence, overlay alphabet, superpose sequence,
Returns    : one hash ref.
Usage      : %out =%{&superpose_seq_hash(\%hash1, \%hash2)};
Version    : 1.0
Warning    : Accepts only two HASHes and many possible gap_chr. Default gap is '-'

insert_gaps_in_seq_hash

Download insert_gaps_in_seq_hash .pl
Argument   : 2 ref for hash of identical keys and value length.
Author     : A Biomatic
Function   : superpose two hashes of the same sequence or same seq. length sequences,
             but unlike 'superpose_seq_hash', this inserts gaps and extend the
             sequences.
             (name1_sec  hHHHHHH EEEEEEE) +
             (name1_seq  .CDEABC..AD..EFD..EKST) => (name1_ext  .hHHHHH..H...EEE..EEEE)
             In the example, the undefined sec. str. position is replaced as gaps('.')
             Uses the second hash a template for the first sequences. gap_char is
             '-' or '.'
Keywords   : superposing sequences with gaps
Returns    : one hash ref.
Usage      : %out_extended_seq =%{&insert_gaps_in_seq_hash(\%hash1, \%hash2)};
Version    : 1.1
Warning    : coded by A Biomatic

get_position_shift_rate

Download get_position_shift_rate .pl
Argument   : %{&get_position_shift_rate(\%msfo_file, \%jpo_file)};
             Whatever the names, it takes one TRUE structral and one ALIGNED hash.
Author     : A Biomatic
Example    : my(%error_rate)=%{&get_position_shift_rate(\%input, \%input2)};
Function   : This is to get position specific error rate for line display rather than
             actual final error rate for the alignment. Takes two file names of seq.
             Output >>
             seq1_seq2  1110...222...2222
             seq2_seq3  1111....10...1111
             seq1_seq3  1111....0000.0000

Options    : 'ss' for secondary structure regions(Helix and Beta region only
                 calculation for error rate). There is specialized sub called
              get_segment_shift_rate for sec. str. only handling.

    $ss_opt            becomes    ss by  ss, SS, -ss, -SS     #  for secondary structure only
    $H                 =         'H' by   -H or -h or H       # to retrieve only H segment
    $S                 becomes   'S' by   -S or  S            # to retrieve only S segment
    $E                 becomes   'E' by   -E or  E            # to retrieve only E segment
    $T                 becomes   'T' by   -T or -t or T or t  # to retrieve only T segment
    $I                 becomes   'I' by   -I or  I            # to retrieve only I segment
    $G                 becomes   'G' by   -G or -g or G or g  # to retrieve only G segment
    $B                 becomes   'B' by   -B or -b or B or b  # to retrieve only B segment
    $HELP              becomes    1  by   -help   # for showing help
    $simplify          becomes    1  by   -p or P or -P, p
    $simplify          becomes    1  by   -simplify or simplify, Simplify SIMPLIFY
    $comm_col          becomes   'C' by   -C or C or common
    $LIMIT             becomes    L  by   -L, L               # to limit the error rate to 9 .

Returns    : \%final_posi_diffs;
Usage      : %rate_hash = %{&get_position_shift_rate(\%hash_msf, \%hash_jp)};
Version    : 1.5
Warning    : split and join char is ','; (space)

convert_num_0_or_1_hash_opposite

Download convert_num_0_or_1_hash_opposite .pl
Argument   : two references, one for hash one for scaler for threshold

Author     : A Biomatic
Example    : A hash =>  name1  10012924729874924792742749748374297
                        name2  10012924729874924792710012924729874
             A threshold => 4
             !! if numbers are smaller than 4, they become 1 (or true).
             Outputhash  =>  name1  11111011011111011111011011110101111
                        name2  11111011010001011001011010010101100

             ($ref1, $ref2)=&convert_num_to_0_or_1_hash(\%hash, \%hash, \$threshold);
             above is the example when with more than 2 input hashes.
Function   : changes all the numbers into 0 or 1 according to threshold given.
             convert_num_0_or_1_hash converts threshold and bigger nums. to
             '0' while convert_num_0_or_1_hash_opposite converts to '1'.
Usage      : with a variable for threshold ->

               %out = %{&convert_num_0_or_1_hash_opposite(\%input_hash, \$threshold)};

Version    : 1.0
Warning    : Threshold value is set to 0 as well as all values smaller than that.

parse_arguments

Download parse_arguments .pl
Argument   : uses @ARGV
Author     : A Biomatic
Example    : &parse_arguments(1);
             @files=@{&parse_arguments(1)};
Function   : Parse and assign any types of arguments on prompt in UNIX to
             the various variables inside of the running program.
             This is more visual than getopt and easier.
             just change the option table_example below for your own variable
             setttings. This program reads itself and parse the arguments
             according to the setting you made in this subroutine or
             option table in anywhere in the program.
Options    : '0'  to specify that there is no argument to sub, use
              &parse_arguments(0);
             parse_arguments itself does not have any specific option.
             '#' at prompt will make a var  $debug set to 1. This is to
              print out all the print lines to make debugging easier.
Returns    : Filenames in a reference of array
             and input files in an array (file1, file2)=@{&parse_arguments};
Usage      : &parse_arguments; or  (file1, file2)=@{&parse_arguments};
Version    : 1.6
Warning    : HASH and ARRAY mustn't be like = (1, 2,3) or (1,2 ,3)

print_seq_in_block

Download print_seq_in_block .pl
Argument   : many refs  for hash (one for bottm, one for top, etc,top hash is usually
               to denote certain caculations or results of the bottom one
Author     : A Biomatic
Enclosed   : -- Following are examples.
             Example of ( no option, DEFAULT )  # Example of ('i' or 'I' option,
                                                                INTERLACE )
             6taa           ----ATPADWRSQSIY    #   6taa       ------ATPADWRSQSIY
             2aaa           ------LSAASWRTQS    #   6taa       ------CCHHHHCCCCEE
             1cdg           APDTSVSNKQNFSTDV    #   6taa       ------563640130000

             6taa           ------CCHHHHCCCC    #   2aaa       ------LSAASWRTQSIY
             2aaa           ------CCHHHHCCCC    #   2aaa       ------CCHHHHCCCCEE
             1cdg           CCCCCCCCCCCCCCCC    #   2aaa       ------271760131000

             6taa           ------5636401300    #   1cdg       APDTSVSNKQNFSTDVIY
             2aaa           ------2717601310    #   1cdg       CCCCCCCCCCCCCCCCEE
             1cdg           6752327236000000    #   1cdg       675232723600000000

             Example of('s' or 'S' option,SORT) # Example of ('o' or 'O' option,
                                                        ORDERED by input hashes )

             1cdg           APDTSVSNKQNFSTDV    #   6taa       ------ATPADWRSQSIY
             2aaa           ------LSAASWRTQS    #   2aaa       ------LSAASWRTQSIY
             6taa           ------ATPADWRSQS    #   1cdg       APDTSVSNKQNFSTDVIY

             1cdg           CCCCCCCCCCCCCCCC    #   6taa       ------CCHHHHCCCCEE
             2aaa           ------CCHHHHCCCC    #   2aaa       ------CCHHHHCCCCEE
             6taa           ------CCHHHHCCCC    #   1cdg       CCCCCCCCCCCCCCCCEE

             1cdg           6752327236000000    #   6taa       ------563640130000
             2aaa           ------2717601310    #   2aaa       ------271760131000
             6taa           ------5636401300    #   1cdg       675232723600000000
Example    : If there are 3 hashes output will be; (in the order of \%hash3, \%hash2, \%hash1)
             >> 1st Hash        >> 2nd Hash         >> 3rd Hash
             Name1  THIS-IS-    Name123  eHHHHHHH   Name123  12222223

             You will get;
                            Name1    THIS-IS-
                            Name123  eHHHHHHH
                            Name123  12222223

Function   : gets many refs  for one scalar  or hashes and prints
               the contents in lines of \$block_leng(the only scalar ref. given) char.
Options    : 'o' or 'O' => ordered hash print,
             'n' or'N' => no space between blocks.
             's' or 'S' => printout sorted by seq names.
             'i' or 'I' => interlaced print.(this requires identical names in hashes)
             'v' or 'V' => show sequence start number at each line
             (all options can be like \$sort
             while $sort has 's' as value. naked number like 100 will be the
             block_length. 'i' or 'I' => interlaced print.(this requires
             identical names in hashes)
Usage      : &print_seq_in_block (\$block_leng, 'i',\%h1, 'sort', \%h2, \%hash3,,,);
Version    : 1.1

open_dssp_files

Download open_dssp_files .pl
Argument   : files names like (6taa, 6taa.dssp) If you put just '6taa' without extension, it
             searches if there is a '6taa.dssp' in both PWD and $DSSP env. set directory.
             ---------- Example of dssp ---
             **** SECONDARY STRUCTURE DEFINITION BY THE PROGRAM DSSP, VERSION JUL
             REFERENCE W
             HEADER    RIBOSOME-INACTIVATING PROTEIN           01-JUL-94   1MRG
             COMPND    ALPHA-MOMORCHARIN COMPLEXED WITH ADENINE
             SOURCE    BITTER GOURD (CUCURBITACEAE MOMORDICA CHARANTIA) SEEDS
             AUTHOR    Q
             246  1  0  0  0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN)                .
             112 95.0   ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2)                                                                         .
             171 69.5   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J)  , SAME NUMBER PER 100 RESIDUES                              .
             12   4.9   TOTAL NUMBER OF HYDROGEN BONDS IN     PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
             36  14.6   TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
             1    0.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-5), SAME NUMBER PER 100 RESIDUES                              .
             1    0.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-4), SAME NUMBER PER 100 RESIDUES                              .
             74  30.1   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4), SAME NUMBER PER 100 RESIDUES                              .
             5    2.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+5), SAME NUMBER PER 100 RESIDUES                              .
             1    2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     *** HISTOGRAMS OF ***           .
             0    0  0  0  1  1  0  2  0  0  1  0  0  1  0  0  0  0  0  2  0  0  0  0  0  0  0  0  0  0    RESIDUES PER ALPHA HELIX         .
             1    0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    PARALLEL BRIDGES PER LADDER      .
             2    0  1  2  0  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    ANTIPARALLEL BRIDGES PER LADDER  .
             2    0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    LADDERS PER SHEET                .
             #   RESIDUE AA STRUCTURE BP1 BP2  ACC   N-H-->O  O-->H-N  N-H-->O  O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA
             1    1   D              0   0  132    0, 0.0   2,-0.3   0, 0.0  49,-0.2   0.000 360.0 360.0 360.0 153.4   44.0   96.9  -23.8
             2    2   V  E     -a   50   0A  10   47,-1.5  49,-2.8   2, 0.0   2,-0.3  -0.889 360.0-163.3-115.9 151.4   43.1  100.4  -22.5
             3    3   S  E     -a   51   0A  63   -2,-0.3   2,-0.3  47,-0.2  49,-0.2  -0.961  10.3-172.8-131.0 152.3   44.8  103.7  -23.4
             4    4   F  E     -a   52   0A   8   47,-2.2  49,-2.3  -2,-0.3   2,-0.4  -0.985   6.9-161.2-143.2 139.5   45.0  107.2  -22.0
             5    5   R  E     -a   53   0A 144   -2,-0.3   4,-0.2  47,-0.2  49,-0.2  -0.993   9.7-156.0-121.0 125.9   46.6  110.2  -23.6
             6    6   L  S    S+     0   0    1   47,-2.3   2,-0.5  -2,-0.4   3,-0.4   0.644  73.2  90.9 -73.3 -22.4   47.5  113.2  -21.4
             7    7   S  S    S+     0   0   81   47,-0.3   3,-0.1   1,-0.2  -2,-0.1  -0.695 106.0   5.2 -75.5 121.0   47.4  115.6  -24.4
             8    8   G  S    S+     0   0   72   -2,-0.5  -1,-0.2   1,-0.3   5,-0.1   0.269  97.6 147.8  90.2 -10.7   43.9  117.0  -24.7
             9    9   A        +     0   0   10   -3,-0.4  -1,-0.3  -4,-0.2  -3,-0.1  -0.256  16.8 166.8 -58.8 142.4   42.9  115.2  -21.5
             (\$inputfile1, \$inputfile2, .... )};
Author     : A Biomatic
Function   : open dssp files and put sequences in a hash(s)
              It can take options for specific secondary structure types. For example,
              if you put an option $H in the args of the sub with the value of 'H'
              open_dssp_files will only read secondary structure whenever it sees 'H'
              in xxx.dssp file ignoring any other sec. str. types.
              If you combine the options of 'H' and 'E', you can get only Helix and long
              beta strand sections defined as segments. This is handy to get sec. str. segments
              from any dssp files to compare with pdb files etc.
             With 'simplify' option, you can convert only all the 'T', 'G' and 'I' sec. to
              'H' and 'E'.
Options    : H, S, E, T, I, G, B, P, C, -help
 $H        =        'H' by   -H or -h or H or h  # to retrieve 4-helix (alpha helical)
 $S        becomes  'S' by   -S or -s or S or s  # to retrieve Extended strand, participates in B-ladder
 $E        becomes  'E' by   -E or -e or E or e  # to retrieve residue in isolated Beta-bridge
 $T        becomes  'T' by   -T or -t or T or t  # to retrieve H-bonded turn
 $I        becomes  'I' by   -I or -i or I or i  # to retrieve 5-helix (Pi helical) segment output
 $G        becomes  'G' by   -G or -g or G or g  # to retrieve 3-helix (3-10 helical)
 $B        becomes  'B' by   -B or -b or B or b  # to retrieve only B segment
 $simplify becomes   1  by   -p or P or -P, p
 $comm_col becomes  'c' by   -c or c or C or -C or common
 $HELP     becomes   1  by   -help   # for showing help

Returns    : (*out, *out2)  or (@out_array_of_refs)
Usage      : (*out, *out2) = @{&open_dssp_files(\$inputfile1, \$inputfile2, \$H, \$S,,,,)};
             (@out)        = @{&open_dssp_files(\$inputfile1, \$inputfile2, \$H, \$S,,,,)};
Version    : 2.9
$debug feature has been added to make it produce error messages with '#' option.
Warning    : 6taa.dssp  and 6taa are regarded as the same.

get_wrong_segment_rate

Download get_wrong_segment_rate .pl
Author     : A Biomatic
Example    :  hash of 3 keys and values.
             2aaa_6taa      -------00000---------00000000----0000-------00000-
             1cdg_6taa      -------442---------------2222-----------------000-
             1cdg_2aaa      -------222---------------2222-----------------000-

             In the above there are two segments wrong in 3 segment blocks = 2/3
              hash of 3 percentage rates.

             2aaa_6taa      0 %
             1cdg_6taa      66.6666666666667 %
             1cdg_2aaa      66.6666666666667 %

Function   : Treats the segment as one single big error.
             calculates the wrong segment number compared to the correct ones.
Usage      : print_seq_in_block( &get_wrong_segment_rate(\%superposed_hash) );
Used in    : get_segment_shift_rate
Version    : 1.0

get_correct_percent_alignment_rate

Download get_correct_percent_alignment_rate .pl
Argument   : two sequence files which have identical sequence names.
Author     : A Biomatic
Function   : accepts two files and prints out the sequence identities of the alignment.
Options    : h  # for help
             v  # for verbose printouts(prints actual sequences)
Returns    : reference of Scalar for percentage correct alignment(for already
             aligned sequences)
Usage      : &get_correct_percent_alignment_rate(\$file1, \$file2);
Warning    : Alpha version,  A Biomatic , made for Bissan

read_any_seq_files

Download read_any_seq_files .pl
Argument   : one of more ref. for scalar.
Author     : A Biomatic
Example    : (*out1,  *out2) =&read_any_seq_files(\$input1, \$input2);
             : (@out_ref_array)=@{&read_any_seq_files(\$input1, \$input2)};
             : (%one_hash_out) =%{&read_any_seq_files(\$input1)};
Function   : Tries to find given input regardless it is full pathname, with or
             without extension. If not in pwd, it searches the dirs exhaustively.
Keywords   : open_any_seq_files
Options    : v for $verbose setting showing some information in runtime

Returns    : 1 ref. for a HASH of sequence ONLY if there was one hash input
             1 array (not REF.) of references for multiple hashes.
Usage      : %out_seq=%{&read_any_seq_files(\$input_file_name)};
Version    : 1.1

open_jp_files

Download open_jp_files .pl
Author     : A Biomatic
Function   : reads jp files and stores results in a hash.
Returns    : a reference of a hash for names and  their sequences.
Usage      : %out_hash=%{&open_jp_files(\$file_name)};
Version    : 1.1
Warning    : All the spaces  '-' !!!

open_msf_files

Download open_msf_files .pl
Argument   : (\$inputfile1, \$inputfile2, .... )};
Author     : A Biomatic
Function   : open msf files and put sequences in a hash(s)
Returns    : (*out, *out2)  or (@out_array_of_refs)
Usage      : (*out, *out2) = @{&open_msf_files(\$inputfile1, \$inputfile2)};
             : %hash_seq = %{&open_msf_files(\$inputfile1)};
             : (@out)        = @{&open_msf_files(\$inputfile1, \$inputfile2)};
             ---------- Example of MSF ---
             PileUp

             MSF:   85  Type: P    Check:  5063   ..

Version    : 1.1

default_help

Download default_help .pl
Example    : &default_help2; &default_help2(\$arg_num_limit);   &default_help2( '3' );
             1 scalar digit for the minimum number of arg (optional),
             or its ref. If this defined, it will produce exit the program
             telling the minimum arguments.
Function   : Prints usage information and others when invoked. You need to have
             sections like this explanation box in your perl code. When invoked,
             default_help routine reads the running perl code (SELF READING) and
             displays what you have typed in this box.
             After one entry names like # Function :, the following lines without
             entry name (like this very line) are attached to the previous entry.
             In this example, to # Function : entry.
Package    : File_Util
Returns    : formated information
Tips       : This usually goes with  parse_arguments.pl (= easy_opt.pl)
Usage      : &default_help2;  usually with 'parse_arguments' sub.
Used in    : parse_arguments,
Version    : 3.2
Warning    : this uses format and references

set_debug_option

Download set_debug_option .pl
Author     : A Biomatic
Class      : Utility
Example    : set_debug_option #    <-- at prompt.
Function   : If you put '#' or  '##' at the prompt of any program which uses
             this sub you will get verbose printouts for the program if the program
             has a lot of comments.
Options    : #   for 1st level of verbose printouts
             ##  for even more verbose printouts
$debug  becomes 1 by '#'  or '_'
$debug2 becomes 1 by '##'  or '__'

Reference  : http://sonja.acad.cai.cam.ac.uk/perl_for_bio.html
Returns    : $debug
Usage      : &set_debug_option;
Version    : 1.8
             generalized debug var is added for more verbose printouts.

remov_com_column

Download remov_com_column .pl
Argument   : accepts reference for hash(es) and array(s).
Author     : A Biomatic
Function   : removes common gap column in seq.
Keywords   : remove_com_column, remove_common_column,
             remove_common_gap_column, remov_common_gap_column,
             remove com column
Returns    : a ref. of  hash(es) and array(s).

             name1   ABCDE....DDD       name1  ABCDE..DDD
             name2   ABCDEE..DD..  -->  name2  ABCDEEDD..
             name3   ...DEE..DDE.       name3  ...DEEDDE.

             (ABC....CD, ABCD...EE) --> (ABC.CD, ABCDEE)
             from above the two column of dot will be removed
             To remove absurd gaps in multiple sequence alignment. for nt6-hmm.pl
Usage      : %new_string = %{&remov_com_column(\%hashinput)};
             @out=@{&remov_com_column(\@array3)};

tidy_secondary_structure_segments

Download tidy_secondary_structure_segments .pl
Argument   : hashes and [options]. No options result in default of 'H3', 'E3'
Author     : A Biomatic
Example    : print_seq_in_block(&tidy_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');
             

             1cdg_2aaa      -------EEE-----------EE--EEEE------EE---------EEE-
             1cdg_6taa      -------EEE-----------EE--EEEE------EE---------EEE-
             2aaa_6taa      -------EEEEE------EE-EEEEEEEE----EEEE-------EEEEE-

             

             1cdg_6taa      -------------------------EEEE---------------------
             1cdg_2aaa      -------------------------EEEE---------------------
             2aaa_6taa      -------EEEEE---------EEEEEEEE----EEEE-------EEEEE-

Function   : receives any secondary structure assignment hashes and
             tidys up them. That is removes very shoft secondary structure
             regions like( --HH--, -E-, -EE- ) according to the given minimum
             lengths(threshold) of segments by you.
Options    : something like 'H3' or 'E3' for minimum segment length set to 3 positions.
Package    : Bio::Seq
Returns    : array of references of hashes.
Usage      : print_seq_in_block(&tidy_secondary_structure_segments(\%hash, 'e4', 'h4'), 's');

Version    : 1.0.0

get_posi_diff

Download get_posi_diff .pl
Argument   : Takes two ref. for arrays which have positions of residues.
Author     : A Biomatic
Example    : @compacted_posi_dif =(1 ,2, 1, 1, '.' ,2,  1,  1, '.');
             @compacted_posi_dif2=(4 ,2, 1, 1, ,2,  1, '.' ,3,  1);
             output ==> ( 3 0 0 0 . 1 . 2 .)   (it ignores positions which have non digits.
             output ==> (-3 0 0 0 . 1 .-2 .) when abs is not used.
Returns    : one ref. for an @array of differences of input arrays. array context.
Usage      : @position_diffs =&get_posi_diff(\@seq_position1,\@seq_position2);
Used in    : evalign.pl, get_position_shift_rate
Version    : 1.4

get_posi_sans_gaps

Download get_posi_sans_gaps .pl
Argument   : one scalar variable input of sequence string.
Author     : A Biomatic
Returns    : the positions of residues after removing gaps(but keeps pos).
               used for analysis of shifted positions of seq. comparison.
Usage      : @seq_position1 = &get_posi_sans_gaps($string1);
Version    : 1

get_common_column

Download get_common_column .pl
Argument   : 2 or more ref for hash of identical keys and value length.
             One optional arg for replacing space char to the given one.
Author     : A Biomatic
Class      : get_common_column, get_common_column_in_seq,
             get common column in sequence, superpose_secondary_structure,
             get_common_secondary_structure,
             for secondary structure only representation.
Example    : %out =%{&get_common_column(\%hash1, \%hash2, '-')};
             output> with 'E' option >>> "name1     --HHH--1232-"
   Following input will give;
   %hash1 = ('s1', '--EHH-CHHEE----EHH--HHEE----EHH--HHEE----EHH-CHHEE--');
   %hash2 = ('s2', '--EEH-CHHEE----EEH-CHHEE----EEH-CHHEE----EEH-CHHEE--');
   %hash3 = ('s3', '-KEEH-CHHEE-XX-EEH-CHHEE----EEH-CHHEE----EEH-CHHEE--');
   %hash4 = ('s4', '-TESH-CHEEE-XX-EEH-CHHEE----EEH-CHHEE----EEH-CHHEE--');

     s1_s2_s3_s4    --E-H-CH-EE----E-H--HHEE----E-H--HHEE----E-H-CHHEE--

Function   : (name1         --EHH--HHEE-- )
             (name2         --HHH--EEEE-- ) ==> result is;

             (name1_name2   -- HH--  EE-- )
             to get the identical chars in hash strings of sequences.

Keywords   : Overlap, superpose hash, overlay identical chars, superpose_seq_hash
             get_common_column, get_com_column, get_common_sequence,
             get_common_seq_region, multiply_seq_hash,
Package    : Array_Util
Returns    : one hash ref. of the combined key name (i.e., name1_name2). Combined by '_'
Usage      : %out =%{&get_common_column(\%hash1, \%hash2, '-')};
Version    : 1.5
Warning    : This gets more than 2 hashes. Not more than that!


array_average

Download array_average .pl
Argument   : takes one array reference.
Author     : A Biomatic
Function   : (the same as average_array)
Keywords   : get_array_average, av_array, average_array, get_average_array
             average_of_array, average_array
Returns    : single scaler digit.
Usage      : $output = &array_average(\@any_array);
Version    : 1.2
Warning    : If divided by 0, it will automatically replace it with 1

find_seq_files

Download find_seq_files .pl
Argument   : (\$input_file_name) while $input_file_name can be  'xxx.xxx', or '/xxx/xxx/xxx/xxy.yyy'
             or just directory name like 'aat' for  /nfs/ind4/ccpe1/people/A Biomatic /jpo/align/aat
             then, it tries to find a file with stored seq file extensions like msf, jp, pir etc
             to make aat.msf, aat.jp, aat.pir ... and searches for these files.
Author     : A Biomatic
Example    : $found_file=${&find_seq_files(\$input_file_name)};
Function   : (similar to find.pl) used in 'read_any_seq_file.pl'
             seeks given test file in pwd, specified dir, default path etc.
             If not found yet, it looks at all the subdirectories of path and pwd.
             PATH environment dirs, then returns full path file name.
Keywords   : find_anyj_seq_files, find any seq files, find seq files
Returns    : return( \$final );
Usage      : $found_file = ${&find_seq_files(\$input_file_name)};
Version    : 1.0

put_position_back_to_str_seq

Download put_position_back_to_str_seq .pl
Argument   : takes two refs for arrays (one for char the other for digits
Author     : A Biomatic
Example    : @string_from_struct=('X', 'T', 'A' ,'B' , '.' ,'F',  'G', '.' , 'O' ,'P', '.');
             @compacted_posi_dif=(1 ,2, 1, 1, ,2, 1, 1, 1);
Returns    : a ref. for an array
Usage      : @result =@{&put_position_back_to_str_seq(\@string_from_struct, \@compacted_posi_dif)};
Version    : 1.0

hash_common_by_keys

Download hash_common_by_keys .pl
Author     : A Biomatic
Returns    : the VALUES OF THE FIRST HASH which occur in later hashes
             are returned
Usage      : %hash1_value = %{&hash_common_by_keys(\%hash1, \%hash2,...)};

convert_arr_and_str_2_hash

Download convert_arr_and_str_2_hash .pl
Argument   : one or more ref. of arrays
Author     : A Biomatic
Example    : &print_seq_in_block(&convert_arr_and_str_2_hash(\@input,\@input2,\@input3 ));
             &convert_arr_and_str_2_hash(\$input1,\$input2, '2' );
             results in; (ordering starts from the given '2')
                          array_2       input1 arraystring
                          array_3       input2 arraystring

             one more exam
                          string_6       This is st                  and 3 strings)
                          string_10      This is st
                          array_2        111233434242
                          array_6        111233434242
                          array_10       111243424224
Function   : makes hash(es) out of array(s)
             if ordering digit(s) is put, it orders the keys according to it.
             if ordering digit is not increased by one, the difference is used
             as the increasing factor. No option results in
             array_1, array_2, array_3...

Returns    : one or more ref. of hashes.
Usage      : ($hash1, $hash2)=&convert_arr_and_str_2_hash(\$input, \$input2, '1', '2'.. );
             * This is the combination of convert_string_to_hash & convert_array_to_hash

get_residue_error_rate

Download get_residue_error_rate .pl
Argument   : Takes a ref. for hash which have positions of residues of sequences.
Author     : A Biomatic
Function   : This is the final step in error rate getting.
             gets a ref. of a hash and calculates the absolute position diffs.
Options    : 'L' for limitting the error rate to 9 to make one digit output
$LIMIT becomes 'L' by L, l, -l, -L
Returns    : one ref. for an array of differences of input arrays. array context.
             ---Example input (a hash with sequences); The values are differences after
                                comparion with structural and sequential alignments.
             %diffs =('seq1', '117742433441...000',   <-- input (can be speparated by '' or ','.
                      'seq2', '12222...99999.8888',
                      'seq3', '66222...44444.8822',
                      'seq4', '12262...00666.772.');
             example output;
             seq3_seq4       '0,1,0,0,0,.,.,.,,.,0,,0,0,,0,0,,.,0,,0,0,.'
             seq1_seq2       '0,1,0,1,1,.,.,.,,.,2,,2,2,,2,2,,.,.,,2,2,1'
             seq1_seq3       '0,1,0,1,1,.,.,.,,.,1,,1,1,,0,.,,.,.,,1,1,1'
             seq1_seq4       '0,1,0,,1,1,.,.,.,,.,1,,1,1,0,.,.,,.,1,,2,2'
             seq2_seq3       '0,1,0,,0,0,,.,.,,.,0,,1,0,,0,0,,.,0,,0,0,0'
             seq2_seq4       '0,0,0,,1,0,,.,.,,.,0,,1,0,,0,0,,.,0,,0,0,.'
Usage      : %position_diffs =%{&get_residue_error_rate(\@seq_position1, \@seq_position2)};
Used in    : get_position_shift_rate, previously get_each_posi_diff_hash
Version    : 1.1
Warning    : split and join char is ',';

handle_arguments

Download handle_arguments .pl
Argument   : any type, any amount
Author     : A Biomatic
Class      : Perl::Utility::Arg_handling
Example    : 'handle_arguments(\@array, $string, \%hash, 8, 'any_string')
Function   : Sorts input arguments going into subroutines and returns default
             arrays of references for various types (file, dir, hash, array,,,,)
             If you give (\@out, @file), it will put @out into @array as a ref
             and also the contents of @out will be dereferenced and put to
             raw_string regardless what is in it).

Keywords   : handling arguments, parsing arguments,
Returns    : Following GLOBAL variables

             $num_opt,    @num_opt     @file          @dir
             $char_opt,   @char_opt    %vars          @array,
             @hash        @string,     @raw_string    @range,

             $num_opt has 10,20
             @num_opt has (10, 20)
             @file has  xxxx.ext
             @dir has  dir  or /my/dir
             $char_opt has 'A,B'
             @char_opt has (A, B)
             @array has  (\@ar1, \@ar2)
             @hash has (\%hash1, \%hash2)
             @string  ('sdfasf', 'dfsf')
             @raw_string (file.ext, dir_name, 'strings',,)
             @range has values like  10-20
             %vars deals with x=2, y=3 stuff.

Tips       : takes 0.02 u time with INDY
Usage      : Just put the whole box delimited by the two '###..' lines below
             to inside of your subroutines. It will call 'handle_arguments'
             subroutine and parse all the given input arguments.
             To use, claim the arguments, just use the variable in the box.
             For example, if you had passed 2 file names for files existing
             in your PWD(or if the string looks like this: xxxx.ext),
             you can claim them by $file[0], $file[1] in
             your subroutine.
Used in    : everywhere
Version    : 4.6
             set_debug_option  is added.

show_hash

Download show_hash .pl
Author     : A Biomatic
Example    : Output:      item1
             Output:      item2
             Output:      item3
Function   : for debugging purpose. Shows any array elem line by line.
             the line is 60 elements long (uses recursion)
Options    : -s or -S or s or S for spaced output. Eg)
             seq1       1 1 1 1 1 1 1 1 1 1 1 1

             instead of
             seq1       111111111111

             -h or -H or h or H for horizontal line of '---------...'

Package    : Array_Util
Usage      : &show_hash(\@input_array);
Version    : 1.7
Warning    : There is a global variable:  $show_hash_option
             It tries to detect any given sting which is joined by ','

remove_dup_in_array

Download remove_dup_in_array .pl
Argument   : one or more refs for arrays or one array.
Example    : (1,1,1,1,3,3,3,3,4,4,4,3,3,4,4);  --> (1,3,4);
Function   : removes duplicate entries in an array.
Keywords   : merge array elements, remove_repeting_elements,
             remove_same_array_elements
Returns    : one or more references.
Usage      : @out2 = @{&remove_dup_in_array(\@input1, \@input2,,,,)};
             @out1 = @{&remove_dup_in_array(\@input1 )};
Version    : 1.3

assign_options_to_variables

Download assign_options_to_variables .pl
Argument   : None.
Author     : A Scientist
Example    : When you want to set 'a' char to a variable called '$dummy' in
             the program, you put a head box commented line
             '#  $dummy    becomes  a  by  -a '
             Then, the parse_arguments and this sub routine will read the head
             box and assigns 'a' to $dummy IF you put an argument of '-a' in
             the prompt.
Function   : Assigns the values set in head box to the variables used in
             the programs according to the values given at prompt.
             This produces global values.
             When numbers are given at prompt, they go to @num_opt
              global variable. %vars global option will be made

Options    : '#' at prompt will make a var  $debug set to 1. This is to
              print out all the print lines to make debugging easier.
Package    : Bio::Utils
Returns    : Some globaly used variables according to prompt options.
             @num_opt,

Tips       : Used with 'parse_arguments'
Usage      : &assign_options_to_variables(\$input_line);
Version    : 2.4
Warning    : This is a global vars generator!!!