needle tools perl subroutines

BMERC : needle tools : Appendices : perl subroutines


[This is an experiment. These subroutines are not documented, except inline. So I've snarfed the inline comments out of the source code. Is this useful to anyone? -- rgr, 2-Nov-99.]

Note: I still consider these subroutines to be internal; their specifications may change without notice at my whim.

Table of contents

  1. needle tools perl subroutines
    1. Table of contents
    2. Module usage hierarchy
    3. Scripts referenced by modules
    4. Alphabetical listing
    5. Source file listing
      1. align.pm subroutines
      2. atom-coords.pm subroutines
      3. atom-names.pm subroutines
      4. chain-spec.pm subroutines
      5. contact.pm subroutines
      6. core-lib.pm subroutines
      7. correct-align.pm subroutines
      8. exp-lib.pm subroutines
      9. fssp-lib.pm subroutines
      10. hallucinate-cb.pm subroutines
      11. html-hacks.pm subroutines
      12. los-rules.pm subroutines
      13. pdb-seq.pm subroutines
      14. ppml.pm subroutines
      15. print-multiple-alignments.pm subroutines
      16. read-dssp-file.pm subroutines
      17. rule-based-make.pm subroutines
      18. rule-based-mrf.pm subroutines
      19. score-lib.pm subroutines
      20. seg-lib.pm subroutines
      21. seq-lib.pm subroutines
      22. std-pdbres.pm subroutines
      23. vvenv-lib.pm subroutines
      24. vvenv-rules.pm subroutines
      25. weighted-scores.pm subroutines
    6. Scripts with subroutines
      1. aa-thresholds-to-envs.pl subroutines
      2. blast-to-clique.pl subroutines
      3. calculate-vv.pl subroutines
      4. check-pdb-seqs.pl subroutines
      5. compare-segs.pl subroutines
      6. correct-alignment.pl subroutines
      7. dssp4.pl subroutines
      8. expand-dssp.pl subroutines
      9. extract-toc.pl subroutines
      10. fa2tbl.pl subroutines
      11. filter-pdb-atoms.pl subroutines
      12. fssp-core-corr.pl subroutines
      13. install.pl subroutines
      14. los.pl subroutines
      15. make-align-depends.pl subroutines
      16. make-core.pl subroutines
      17. make-domain-core.pl subroutines
      18. make-seq-file.pl subroutines
      19. make-toc.pl subroutines
      20. pdb-domain-seq.pl subroutines
      21. pdb-to-dsm.pl subroutines
      22. pdb-to-seq.pl subroutines
      23. util-parse-dssp.pl subroutines
      24. vvenv-pair-scores.pl subroutines

Module usage hierarchy

Scripts referenced by modules

Alphabetical listing

acos($cos_x) (defined in hallucinate-cb.pm)
Not provided by perl, need to hack. (Shouldn't need an acos?) -- rgr, 13-Aug-97.
add_file_type_to_make(??) (defined in rule-based-make.pm)
The argument list is a set of file types. This call requests that they all be made by the make_required_dependencies subroutine.
align_sequences_globally($seq1, $seq2) (defined in align.pm)
Align the two sequences given as arguments using global alignment. Requires the globalS program on the search path, which actually does the work. Returns a Perl list of ($aligned_seq1, $aligned_seq2), where both strings are the same length.
alignment_gap_p(??) (defined in align.pm)
Expects a string, and returns nonzero if it contains at least one gap. Dashes are conventional, but dots are used by FSSP.
atom_coords($atom) (defined in atom-coords.pm)
Given an ATOM record, return the (x, y, z) as a list of three elements. [originally stolen from the filter-pdb-atoms.pl script. -- rgr, 19-Nov-98.]
build_cti_to_fssp_alignment_map($core_file, $core_locus) (defined in fssp-lib.pm)
build and return an array that maps each residue position (identified by "core total index") of the named core file (which we read so we can use it) to an index in the loaded FSSP data accessed via the given core locus. also builds the $fssp_core_secondary_structure string by side effect.
chain_spec_chain_ids($chain_spec) (defined in rule-based-make.pm)
Return the set of chain ID's in the passed chain spec as a string in the order in which they appear. See also the core_chain_ids subroutine.
compute_exposure_bin($exposure) (defined in exp-lib.pm)
Compute the exposure portion of the environment. $env is zero-based.
contact_relevant_p($ss, $aa1, $aa2, $env_space_vector) (defined in contact.pm)
not documented
core_chain_ids(??) (defined in core-lib.pm)
Return the set of chain ID's in the core as a string in the order in which they appear. (Assumes that each segment has a unique chain ID.)
core_pdbres_to_cti_and_aa($pdbres) (defined in vvenv-lib.pm)
Given a pdbres, return its CTI and AA letter code -- shorthand for a bunch of array and hash lookups. Only warns if not found (and only once).
correct_alignment_sequence($seq1_orig_align, $seq2_orig_align, $locus1, $seq1_corr) (defined in correct-align.pm)
Correct the "original" alignment of $seq1_orig_align to $seq2_orig_align by performing global alignment between $seq1_corr and the first aligned sequence; returns both corrected alignment strings. Assumes that $seq1_corr is a superset of $seq1_orig_align, i.e. residues may have been deleted or miscopied in the process of making the original alignment, but none were added.

The original version of this algorithm attempted to correct both aligned sequences at the same time; this version gives an equivalent result if called twice, with the second call using the result of the first call in reverse order. Since we never want to give a false appearance of alignment when both sequences happen to have residues inserted in the same place, we need to treat the sequences independently, i.e. by doing:

		insert1----------
		-------insertion2
rather than trying to deal with "insertion-in-both" as a special case. This independence of insertions into each of the sequences is what made the asymmetric reimplementation possible.
decode_environment($environment) (defined in vvenv-lib.pm)
Given an $environment that is one-based, return a list of individual environment components, still encoded, with secondary structure last and most significant. -- rgr, 28-Aug-97.
define_core($core_locus, $pdb_locus, $chain_spec) (defined in rule-based-make.pm)
Implement defaults.
define_html_paragraph_tags(??) (defined in html-hacks.pm)
Define the set of tags used to delimit HTML paragraphs.
define_make_rule($name, $prefix, $suffix, $macro_name, $rule_generator_function) (defined in rule-based-make.pm)
not documented
define_xval_set_target(??) (defined in rule-based-mrf.pm)
Define a counts file target for this cross-validation set only if there is more than one locus. Otherwise it's a singleton, which is the default anyway. [internal grinder for the read_xval_sets subroutine. -- rgr, 6-Jul-98.]
dump_rule_database(??) (defined in rule-based-make.pm)
Debugging utility. Prints most rule information for all rules to the standard output.
emit_protein_sequence($locus, $sequence, $output_file, $output_format) (defined in seq-lib.pm)
Emit the given locus/sequence pair in the selected format. If the output file is empty or not specified, it defaults to the standard output. The output format can be 'ig', 'fa', 'tbl', or 'just-seq'. If the output format is empty or not specified, it defaults to the value of the global $sequence_output_format variable, which is initially 'ig'.
encode_pairwise_environment(??) (defined in vvenv-lib.pm)
Given a series of arguments, from least to most significant (which is the order named below), turn it into a counting environment encoding. All arguments must be 0 or 1 [except we now have to pass $cb_distance directly -- rgr, 4-Aug-97]. [This is now more or less internal to read_line_of_sight_file, except that its hairy knowledge of encoding details means that it belongs here. -- rgr, 5-Aug-98.]

The arguments are:

($total_vv_up_to_14A_res1 > $vv_exp_tr,
$total_vv_up_to_14A_res2 > $vv_exp_tr,
$cb_distance,
($vv_up_to_7_5A_res1_w1 > ($ss1 ? $vv_tr_e : $vv_tr_h)
  && $vv_up_to_7_5A_res2_w2 > ($ss2 ? $vv_tr_e : $vv_tr_h)),
$seg1, $seg2).
env_secondary_structure(??) (defined in vvenv-lib.pm)
Extract the encoded secondary structure value from an environment.
extract_all_pdb_sequences($file_name, $chains) (defined in pdb-seq.pm)
Given a PDB file (possibly implicitly as stdin, if $pdb_file_name is empty), extract ALL of its sequences, both ATOM and SEQRES versions, and enter them into the %pdb_sequences hash.
find_file_test($rule_name, $locus) (defined in rule-based-make.pm)
Return a file of the indicated type for $locus, or '' if it can't be found. This is a find-only version of find_or_make_file, intended for use when there are multiple possibilities. [new; not used except by the ./make-ctimap-depends.pl script. -- rgr, 11-May-98.]
find_hyperplane($ss, $aa1_name, $aa2_name) (defined in vvenv-lib.pm)
not documented
find_or_make_file($rule_name, $locus) (defined in rule-based-make.pm)
Return a file of the indicated type for $locus. This is the central recursive entry point for makefile generation -- this may call a generator function, which will in general call find_or_make_file one or more times for dependencies. The file may have already been made (or found), in which case we return it directly.
generate_dependency_files(??) (defined in rule-based-make.pm)
Top level routine. This should take care of most cases.
generate_rule($dependencies, $code, @files) (defined in rule-based-make.pm)
Output a rule constructed from $dependencies, $code, and a series of target file names.
get_html_line(??) (defined in html-hacks.pm)
Input a 'line' from stdin, maybe flushing old markup if $strip_p. Returns '' at EOF, regardless of how many times it is called (unless you reset the $html_eof_p flag). There may actually be more than one line if required to ensure that all angle brackets match, but all newlines are preserved. In any case, $html_line_number is the value of $. for the first line read.
hallucinate_cb($xn, $yn, $zn, $xca, $yca, $zca, $xc, $yc, $zc) (defined in hallucinate-cb.pm)
Compute the coordinates of the beta carbon, based on the backbone nitrogen, alpha carbon, and carbonyl carbon atom positions.

[The history of this code is lost in the mist. When I got it it had already been converted to C from an earlier FORTRAN implementation. Neither was at all documented. -- rgr, 8-Aug-96.] [now further converted to perl. -- rgr, 1-Aug-97.]

initialize_singleton_exposure_bins(??) (defined in exp-lib.pm)
Set up $n_singleton_states and $n_singleton_exposure_bins based on options already given.
make_abbrev_dssp_file($raw_locus, $dssp_file_name) (defined in rule-based-make.pm)
Make an abbreviated DSSP file in the current directory from the PDB file.
make_core_file($locus, $core_file_name) (defined in rule-based-make.pm)
not documented
make_core_locus_file_name($index, $locus) (defined in rule-based-make.pm)
Make a standard file name based on the core locus (intended for use as an @make_rule_file_namer entry). This is the normal case, and is the default installed by the define_make_rule subroutine.
make_dssp_seg_file($raw_locus, $seg_file_name) (defined in rule-based-make.pm)
Generate seg file from abbreviated DSSP file. This requires the raw PDB locus (i.e. no chain ID).
make_eisenberg_fat_alanine_exposure_file($locus, $exposure_file_name) (defined in rule-based-make.pm)
Eisenberg "fat alanine" exposure files. Exposure files are always made for the PDB file as a whole.
make_file_name($index, $locus) (defined in rule-based-make.pm)
Make a standard file name, given a rule index and a core locus. This just dispatches to the rule's namer.
make_line_of_sight_file($locus, $hyperenv_file) (defined in los-rules.pm)
not documented
make_mrf_counts_file($locus, $cnt_file) (defined in rule-based-mrf.pm)
$core_file, $seq_file, $se_file, $pe_file
make_mrf_environment_file($locus, $env_file, $class, $keyword_arg) (defined in rule-based-mrf.pm)
Make environments.
make_mrf_gmt_env_file($locus, $gmte_file) (defined in rule-based-make.pm)
Note that this is completely general; it does not depend on the WMS scoring scheme. -- rgr, 25-Jun-97.
make_mrf_gmt_singleton_score_file($locus, $score_file, $counts_name) (defined in rule-based-mrf.pm)
Special case for GMT scores, which require environments. Note that we must be sure to use the same environments that were used to produce the counts in the first place.
make_mrf_loop_score_file(??) (defined in rule-based-mrf.pm)
not documented
make_mrf_los_counts_file($locus, $cnt_file) (defined in weighted-scores.pm)
Make line-of-sight counts for use in weighted scores. [really, this is almost the same as make_mrf_counts_file, except for the $singleton_xval_(core|los)_counts_files kludges at the bottom, and the fact that this version omits singletons. -- rgr, 6-Jul-98.]
make_mrf_pairwise_env_file(??) (defined in rule-based-mrf.pm)
not documented
make_mrf_pairwise_score_file(??) (defined in rule-based-mrf.pm)
not documented
make_mrf_score_file(??) (defined in rule-based-mrf.pm)
Doesn't work for GMT scores, which require environments. See the make_mrf_gmt_singleton_score_file subroutine. -- rgr, 3-Apr-98.
make_mrf_singleton_env_file(??) (defined in rule-based-mrf.pm)
not documented
make_mrf_true_singleton_score_file(??) (defined in rule-based-mrf.pm)
not documented
make_mrf_weighted_pairwise_score_file(??) (defined in weighted-scores.pm)
not documented
make_mrf_weighted_score_file($locus, $score_file, $file_type, $maker_macro, $keyword_arg) (defined in weighted-scores.pm)
Requires two sets of counts, constructs cross-validated "weighted" pairwise score file. -- rgr, 6-Jul-98.
make_no_xval_score_files(??) (defined in rule-based-mrf.pm)
Generate all requested non-cross-validated score files from the $total_counts (the GMT scores must be per-core). The locus is always called "NO_XVAL", which is incompatible with historic naming conventions, but allows us to use the same naming logic. -- rgr, 2-Mar-99.
make_pdb_chain_file_name($index, $locus) (defined in rule-based-make.pm)
Make a file name based on the PDB locus and chain ID(s), intended for use as an @make_rule_file_namer entry. If there is only one chain ID and it is '_', then drop it.
make_pdb_locus_file_name($index, $locus) (defined in rule-based-make.pm)
Make a file name based on the PDB locus (not necessarily the name of the PDB file itself). (Intended for use as an @make_rule_file_namer entry.)
make_raw_vv_data_file($locus, $vv_data_file) (defined in los-rules.pm)
not documented
make_required_dependencies(??) (defined in rule-based-make.pm)
Top-level loop. Makes the required files for each locus in the @core_loci array. -- rgr, 10-Feb-98.
make_seq_file($locus, $sequence_file_name) (defined in rule-based-make.pm)
Make a sequence file from the PDB SEQRES records.
make_vv_singleton_file($locus, $vvs_file) (defined in los-rules.pm)
not documented
make_vvenv_counting_environment(??) (defined in vvenv-rules.pm)
not documented
make_vvenv_edge_environment(??) (defined in vvenv-rules.pm)
[***kludge***: this will make the edge scores *and* envs by side effect, without duplication if both were requested. -- rgr, 27-Mar-98.]
make_vvenv_edge_scores($locus, $es_file) (defined in vvenv-rules.pm)
not documented
make_vvenv_environment_file($locus, $env_file, $class, $keyword_arg) (defined in vvenv-rules.pm)
not documented
make_vvenv_pairwise_environment(??) (defined in vvenv-rules.pm)
not documented
map_cti_to_segment_and_offset($cti) (defined in core-lib.pm)
not documented
map_segment_and_offset_to_cti($seg, $offset) (defined in core-lib.pm)
Hacked to return 0 (an illegal CTI) if $offset is not legal for that segment.
max_string_length(??) (defined in print-multiple-alignments.pm)
given an arbitrary number of string arguments, compute their maximum length.
maybe_find_make_file($name) (defined in rule-based-make.pm)
[replacement for the original find_file subroutine. -- rgr, 10-Feb-98.]
maybe_make_no_xval_file($rule_name, $total_counts, $token, $write_keyword) (defined in rule-based-mrf.pm)
Helper for the make_no_xval_score_files subroutine, below. Generates the NO_XVAL file for the MRF score rule $rule_name, if $rule_name files have been requested.
parse_chain_specification($chain_spec) (defined in chain-spec.pm)
Given a chain specification string (see http://bmerc-www.bu.edu/needle-doc/latest/depend-formats.html#chain-specification for details), parse it into a list of [$chain, $pdbres_start, $pdbres_end] vectors. The $pdbres_start and $pdbres_end values will be '' if unspecified. These values will never be found in an actual PDB file, so callers must treat them specially.
parse_depend_arg($arg) (defined in rule-based-make.pm)
not documented
parse_exposure_argument($arg, $continuation) (defined in exp-lib.pm)
Helper for argument parsing.
parse_mrf_depend_arg($arg) (defined in rule-based-mrf.pm)
Parse a standard make-mrf-depends arg, which includes standard make-lib args as well.
parse_requested_environments($arg) (defined in score-lib.pm)
We are given some combination of commas, dashes, and digits: parse it as a set of requested environments. This is a comma-separated list of subranges, where each subrange is a digit string, or two digit strings separated by dashes.
parse_seq_arg($arg) (defined in seq-lib.pm)
not documented
parse_ss_codes(??) (defined in vvenv-lib.pm)
Given $ss1_name and $ss2_name, return the right secondary structure encoding.
parse_vvenv_argument($arg) (defined in vvenv-lib.pm)
Parse standard arguments. [This set could be made more complete. -- rgr, 16-Jul-97.]
pretty_print_makefile_line($lines, $primary_indent, $secondary_indent, $line_prefix) (defined in ppml.pm)
Do line wrapping and indentation on $lines (the first argument), which may contain one or more logical makefile lines separated by newline characters; each logical makefile line is treated separately. For each logical line, the first physical line is indented by $primary_indent (the second argument); if wrapping should become necessary, a " \" is appended at the end of the line and the second and subsequent physical lines are indented by $secondary_indent (the third argument). All lines are indented with tab characters where appropriate; this means that they start with tab characters if the relevant indentation parameter >= 8). $line_prefix, if supplied, applies to all lines, and is output *before* any indentation. If a null string is given as the $lines argument, a single newline is generated. Redundant linear whitespace is turned into a single space character (except for tabs generated by indentation).
print_macro_and_target($index) (defined in rule-based-make.pm)
Standard call to pretty_print_makefile_line for defining make macros and targets. [now has new args. -- rgr, 6-Jul-98.]
print_macros_and_targets(??) (defined in rule-based-make.pm)
To be called after make_required_dependencies (the top-level maker loop). Print targets for anything we made, dispatching through the @make_rule_target_printer array to allow for rule-specific hooks. -- rgr, 6-Jul-98.
print_make_include_file_header(??) (defined in rule-based-make.pm)
not documented
print_mrf_counts_and_total_counts($index) (defined in rule-based-mrf.pm)
Recipe for printing the counts macro and target, and for building the total counts. The $total_counts file is always built in the current directory. [This doesn't match the rest of the convention, though . . . -- rgr, 26-Jun-97.]
print_multiple_alignments(??) (defined in print-multiple-alignments.pm)
Given an even number of arguments, interpret them as alternating locus/sequence pairs to be printed in parallel as a multiple alignment.
print_words(??) (defined in ppml.pm)
Keeps track of current printing column.
read_core_file($core_file_name, $record_atoms_p) (defined in core-lib.pm)
Given the core file name, extract core, segment, and sequence information from it. If the optional second arg tests true, record ATOM records in $cti_atom[$cti]{$atom} as well. As in needle and stat, the segment index and "core total index" are 1-based (but the residue offset from the start of the seg is 0-based). Returns $core_number_of_residues, the total number of residues in the core file.
read_core_list($file_name) (defined in rule-based-make.pm)
Read the core definition file, defining a set of core loci, each with its associated PDB locus and chain/range specification. These are defined in the @core_loci array and %core_pdb_locus and %core_chain_specification hashes (keyed on locus). Formerly read_core_list just read a core list, the first field (at the time the only field) in the core definition file; this still works for cores that are self-defining based on their names.
read_dssp_file($dssp_file_name) (defined in read-dssp-file.pm)
not documented
read_exposure_file(??) (defined in exp-lib.pm)
Read exposure information from the appropriate file.
read_fssp_file($fssp_file_name) (defined in fssp-lib.pm)
Read an FSSP file. Defines %fssp_indices (which maps locus to index), the 1-based @fssp_loci and @fssp_alignments arrays, and $n_fssp_sequences (which is also returned).
read_html_paragraph(??) (defined in html-hacks.pm)
uses the global $html_next_para as lookahead -- on exit, it is the initiating tag of the next paragraph, while $html_next_line has the remaining unexamined portion of first line.
read_line_of_sight_file($los_file_name) (defined in contact.pm)
Read all line-of-sight environment definitions, filtering out entries for nonexistent core positions. Assumes we have the right core already loaded, and $chain_id defined. Returns a list of environment descriptors, where each such descriptor is [$pdb_index1, $pdb_index2, $cti1, $cti2, $ss1, $ss2, @etc] where @etc contains the fourth and subsequent field from the hyperenv (line-of-sight) file. -- rgr, 11-Nov-97.
read_nexp_exposure_values($file_name) (defined in exp-lib.pm)
Note that the %exposure hash (and %exposure_aa) always uses space instead of "_" for the chain ID in the pdb_index key, regardless of what it looks like in the input file.
read_pairwise_score_file($filename) (defined in score-lib.pm)
Read scores for all environments from the named file, returning a reference to a 3-dimensional array. Relies heavily on line breaks in the standard file format. [***bug***: doesn't re-encode aa indices. -- rgr, 16-Oct-97.]
read_principal_component_hyperplane_definitions($real_contact_defn_file) (defined in vvenv-lib.pm)
Read AA set hyperplane definitions, loading it into the $plane[][][] array (and initializing $n_coeffs and $flush_everything_coeffs). Sole arg is the file name.
read_seg_file($seg_file) (defined in seg-lib.pm)
Read a segment file (see the http://bmerc-www.bu.edu/needle-doc/latest/ss-formats.html#seg-file-format page), initializing the @segment_foo arrays and the %pdbres_to_segment_start hash. (Presently, @segment_ss_index is only used for error messages.)
read_sequence_file($file_name) (defined in seq-lib.pm)
Read a single sequence from a file (currently must be IG format), returning (locus, sequence).
read_singleton_score_file($filename) (defined in score-lib.pm)
Read singleton scores for all environments from the named file, returning a reference to a 2-dimensional array. Relies heavily on line breaks in the standard file format. [***bug***: doesn't re-encode aa indices. -- rgr, 16-Oct-97.] [made singleton version from pairwise one. -- rgr, 12-Dec-97.]
read_vv_exposure_values($file_name) (defined in exp-lib.pm)
Note that the %exposure hash always uses space instead of "_" for the chain ID in the pdb_index key, regardless of what it looks like in the input file.
read_xval_sets($filename) (defined in rule-based-mrf.pm)
Generates rules for nontrivial cross-validation sets, and defines the %locus_xval_set hash. This maps a locus to a non-singleton cross-validation set file name, but only for loci that belong to nontrivial cross-validation sets. Must be called after reading the core list, as it uses the %core_defined_p hash.
res_and_index_to_atom_name($aa_name, $atom_index) (defined in atom-names.pm)
Do the inverse (sort of) of the %residue_atom_names mapping. Defining this as a subroutine means we don't have to build the inverse mapping until we know we're going to need it.
same_sheet_p($class1, $class2) (defined in vvenv-lib.pm)
Return 1 iff the two segments, named by their designators, are on the same sheet (or barrel). We assume that these two designators are for different segments.
set_file_type_make_state($make_state) (defined in rule-based-make.pm)
The first argument is the new state, one of (local path never make); the remaining list is a set of file types to assign to this state. Here and in define_make_rule (where they are initialized) are the only places that should bash the make_rule_make_state variable. -- rgr, 6-May-98. [mostly upward compatible replacement for the add_file_type_to_make subroutine. -- rgr, 11-May-98.]
set_make_prefix_or_suffix($rule_name, $prefix_or_suffix, $new_value) (defined in rule-based-make.pm)
Override to file naming conventions. Redefines a prefix or suffix.
standardize_pdbres($pdbres) (defined in std-pdbres.pm)
Fix the pdbres field to be '####A' format. Append a space (the insertion code) if the last character is not numeric and pad on the left to a total length of five.
tab_to_column($goal_col) (defined in ppml.pm)
May use spaces or tabs. Columns are zero-based.
unclosed_tag_p($string) (defined in html-hacks.pm)
Return nonzero if there is an incomplete tag at the end of the line. The "<...<...>" case is not noticed.
vector_equalp($vect2) (defined in vvenv-lib.pm)
Testing hack. Return 1 if the two vectors (passed by reference) are numerically equal, else 0.

Source file listing

align.pm subroutines

Alignment hacks & tools.
alignment_gap_p(??)
Expects a string, and returns nonzero if it contains at least one gap. Dashes are conventional, but dots are used by FSSP.
align_sequences_globally($seq1, $seq2)
Align the two sequences given as arguments using global alignment. Requires the globalS program on the search path, which actually does the work. Returns a Perl list of ($aligned_seq1, $aligned_seq2), where both strings are the same length.

atom-coords.pm subroutines

The atom_coords subroutine.
atom_coords($atom)
Given an ATOM record, return the (x, y, z) as a list of three elements. [originally stolen from the filter-pdb-atoms.pl script. -- rgr, 19-Nov-98.]

atom-names.pm subroutines

The %residue_atom_names table, and related support.
res_and_index_to_atom_name($aa_name, $atom_index)
Do the inverse (sort of) of the %residue_atom_names mapping. Defining this as a subroutine means we don't have to build the inverse mapping until we know we're going to need it.

chain-spec.pm subroutines

Parsing chain specifications.
parse_chain_specification($chain_spec)
Given a chain specification string (see http://bmerc-www.bu.edu/needle-doc/latest/depend-formats.html#chain-specification for details), parse it into a list of [$chain, $pdbres_start, $pdbres_end] vectors. The $pdbres_start and $pdbres_end values will be '' if unspecified. These values will never be found in an actual PDB file, so callers must treat them specially.

contact.pm subroutines

Library for line-of-sight contacts and contact filtering.
contact_relevant_p($ss, $aa1, $aa2, $env_space_vector)
not documented
read_line_of_sight_file($los_file_name)
Read all line-of-sight environment definitions, filtering out entries for nonexistent core positions. Assumes we have the right core already loaded, and $chain_id defined. Returns a list of environment descriptors, where each such descriptor is [$pdb_index1, $pdb_index2, $cti1, $cti2, $ss1, $ss2, @etc] where @etc contains the fourth and subsequent field from the hyperenv (line-of-sight) file. -- rgr, 11-Nov-97.

core-lib.pm subroutines

perl utilities for manipulating cores.
read_core_file($core_file_name, $record_atoms_p)
Given the core file name, extract core, segment, and sequence information from it. If the optional second arg tests true, record ATOM records in $cti_atom[$cti]{$atom} as well. As in needle and stat, the segment index and "core total index" are 1-based (but the residue offset from the start of the seg is 0-based). Returns $core_number_of_residues, the total number of residues in the core file.
core_chain_ids(??)
Return the set of chain ID's in the core as a string in the order in which they appear. (Assumes that each segment has a unique chain ID.)
map_cti_to_segment_and_offset($cti)
not documented
map_segment_and_offset_to_cti($seg, $offset)
Hacked to return 0 (an illegal CTI) if $offset is not legal for that segment.

correct-align.pm subroutines

Alignment hacks & tools.
correct_alignment_sequence($seq1_orig_align, $seq2_orig_align, $locus1, $seq1_corr)
Correct the "original" alignment of $seq1_orig_align to $seq2_orig_align by performing global alignment between $seq1_corr and the first aligned sequence; returns both corrected alignment strings. Assumes that $seq1_corr is a superset of $seq1_orig_align, i.e. residues may have been deleted or miscopied in the process of making the original alignment, but none were added.

The original version of this algorithm attempted to correct both aligned sequences at the same time; this version gives an equivalent result if called twice, with the second call using the result of the first call in reverse order. Since we never want to give a false appearance of alignment when both sequences happen to have residues inserted in the same place, we need to treat the sequences independently, i.e. by doing:

		insert1----------
		-------insertion2
rather than trying to deal with "insertion-in-both" as a special case. This independence of insertions into each of the sequences is what made the asymmetric reimplementation possible.

exp-lib.pm subroutines

Library for singleton exposure stuff.
parse_exposure_argument($arg, $continuation)
Helper for argument parsing.
initialize_singleton_exposure_bins(??)
Set up $n_singleton_states and $n_singleton_exposure_bins based on options already given.
read_vv_exposure_values($file_name)
Note that the %exposure hash always uses space instead of "_" for the chain ID in the pdb_index key, regardless of what it looks like in the input file.
read_nexp_exposure_values($file_name)
Note that the %exposure hash (and %exposure_aa) always uses space instead of "_" for the chain ID in the pdb_index key, regardless of what it looks like in the input file.
read_exposure_file(??)
Read exposure information from the appropriate file.
compute_exposure_bin($exposure)
Compute the exposure portion of the environment. $env is zero-based.

fssp-lib.pm subroutines

Find correspondences between two cores using FSSP data on stdin. The correspondencies are returned (on stdout) as tab-delimited pairs of (cti1, cti2), where each such pair denotes an FSSP-approved correspondence.
build_cti_to_fssp_alignment_map($core_file, $core_locus)
build and return an array that maps each residue position (identified by "core total index") of the named core file (which we read so we can use it) to an index in the loaded FSSP data accessed via the given core locus. also builds the $fssp_core_secondary_structure string by side effect.
read_fssp_file($fssp_file_name)
Read an FSSP file. Defines %fssp_indices (which maps locus to index), the 1-based @fssp_loci and @fssp_alignments arrays, and $n_fssp_sequences (which is also returned).

hallucinate-cb.pm subroutines

perl version of hallucinate_cb subroutine.

This doesn't have to be really spiffy, since it isn't needed much.

acos($cos_x)
Not provided by perl, need to hack. (Shouldn't need an acos?) -- rgr, 13-Aug-97.
hallucinate_cb($xn, $yn, $zn, $xca, $yca, $zca, $xc, $yc, $zc)
Compute the coordinates of the beta carbon, based on the backbone nitrogen, alpha carbon, and carbonyl carbon atom positions.

[The history of this code is lost in the mist. When I got it it had already been converted to C from an earlier FORTRAN implementation. Neither was at all documented. -- rgr, 8-Aug-96.] [now further converted to perl. -- rgr, 1-Aug-97.]

html-hacks.pm subroutines

Library of hacks for manipulating HTML.
unclosed_tag_p($string)
Return nonzero if there is an incomplete tag at the end of the line. The "<...<...>" case is not noticed.
get_html_line(??)
Input a 'line' from stdin, maybe flushing old markup if $strip_p. Returns '' at EOF, regardless of how many times it is called (unless you reset the $html_eof_p flag). There may actually be more than one line if required to ensure that all angle brackets match, but all newlines are preserved. In any case, $html_line_number is the value of $. for the first line read.
define_html_paragraph_tags(??)
Define the set of tags used to delimit HTML paragraphs.
read_html_paragraph(??)
uses the global $html_next_para as lookahead -- on exit, it is the initiating tag of the next paragraph, while $html_next_line has the remaining unexamined portion of first line.

los-rules.pm subroutines

Rules for constructing line-of-sight and "_bb.singl" files.

These are also known as "raw VV" files.

make_raw_vv_data_file($locus, $vv_data_file)
not documented
make_vv_singleton_file($locus, $vvs_file)
not documented
make_line_of_sight_file($locus, $hyperenv_file)
not documented

pdb-seq.pm subroutines

PDB sequence hacks.

This has a library file of its own so that extract_all_pdb_sequences can be shared between the check-pdb-seqs.pl and pdb-domain-seq.pl scripts. [check-pdb-seqs.pl is not distributed presently. -- rgr, 9-Jul-99.]

extract_all_pdb_sequences($file_name, $chains)
Given a PDB file (possibly implicitly as stdin, if $pdb_file_name is empty), extract ALL of its sequences, both ATOM and SEQRES versions, and enter them into the %pdb_sequences hash.

ppml.pm subroutines

The pretty_print_makefile_line subroutine, and its support.
tab_to_column($goal_col)
May use spaces or tabs. Columns are zero-based.
print_words(??)
Keeps track of current printing column.
pretty_print_makefile_line($lines, $primary_indent, $secondary_indent, $line_prefix)
Do line wrapping and indentation on $lines (the first argument), which may contain one or more logical makefile lines separated by newline characters; each logical makefile line is treated separately. For each logical line, the first physical line is indented by $primary_indent (the second argument); if wrapping should become necessary, a " \" is appended at the end of the line and the second and subsequent physical lines are indented by $secondary_indent (the third argument). All lines are indented with tab characters where appropriate; this means that they start with tab characters if the relevant indentation parameter >= 8). $line_prefix, if supplied, applies to all lines, and is output *before* any indentation. If a null string is given as the $lines argument, a single newline is generated. Redundant linear whitespace is turned into a single space character (except for tabs generated by indentation).

print-multiple-alignments.pm subroutines

Printing multiple alignments (see the print_multiple_alignments subroutine).
max_string_length(??)
given an arbitrary number of string arguments, compute their maximum length.
print_multiple_alignments(??)
Given an even number of arguments, interpret them as alternating locus/sequence pairs to be printed in parallel as a multiple alignment.

read-dssp-file.pm subroutines

Bare-bones DSSP file reader.
read_dssp_file($dssp_file_name)
not documented

rule-based-make.pm subroutines

Library of tools for MRF makefile makers (including the original MRF and the new VVENV environment definitions). This is not directly documented, but see http://bmerc-www.bu.edu/needle-doc/latest/depends.html for information about the programs that use it.

This is a rule-based revision that doesn't support -use-plus.

set_make_prefix_or_suffix($rule_name, $prefix_or_suffix, $new_value)
Override to file naming conventions. Redefines a prefix or suffix.
parse_depend_arg($arg)
not documented
define_core($core_locus, $pdb_locus, $chain_spec)
Implement defaults.
read_core_list($file_name)
Read the core definition file, defining a set of core loci, each with its associated PDB locus and chain/range specification. These are defined in the @core_loci array and %core_pdb_locus and %core_chain_specification hashes (keyed on locus). Formerly read_core_list just read a core list, the first field (at the time the only field) in the core definition file; this still works for cores that are self-defining based on their names.
print_macro_and_target($index)
Standard call to pretty_print_makefile_line for defining make macros and targets. [now has new args. -- rgr, 6-Jul-98.]
generate_rule($dependencies, $code, @files)
Output a rule constructed from $dependencies, $code, and a series of target file names.
print_make_include_file_header(??)
not documented
define_make_rule($name, $prefix, $suffix, $macro_name, $rule_generator_function)
not documented
set_file_type_make_state($make_state)
The first argument is the new state, one of (local path never make); the remaining list is a set of file types to assign to this state. Here and in define_make_rule (where they are initialized) are the only places that should bash the make_rule_make_state variable. -- rgr, 6-May-98. [mostly upward compatible replacement for the add_file_type_to_make subroutine. -- rgr, 11-May-98.]
add_file_type_to_make(??)
The argument list is a set of file types. This call requests that they all be made by the make_required_dependencies subroutine.
dump_rule_database(??)
Debugging utility. Prints most rule information for all rules to the standard output.
maybe_find_make_file($name)
[replacement for the original find_file subroutine. -- rgr, 10-Feb-98.]
chain_spec_chain_ids($chain_spec)
Return the set of chain ID's in the passed chain spec as a string in the order in which they appear. See also the core_chain_ids subroutine.
make_core_locus_file_name($index, $locus)
Make a standard file name based on the core locus (intended for use as an @make_rule_file_namer entry). This is the normal case, and is the default installed by the define_make_rule subroutine.
make_pdb_locus_file_name($index, $locus)
Make a file name based on the PDB locus (not necessarily the name of the PDB file itself). (Intended for use as an @make_rule_file_namer entry.)
make_pdb_chain_file_name($index, $locus)
Make a file name based on the PDB locus and chain ID(s), intended for use as an @make_rule_file_namer entry. If there is only one chain ID and it is '_', then drop it.
make_file_name($index, $locus)
Make a standard file name, given a rule index and a core locus. This just dispatches to the rule's namer.
find_file_test($rule_name, $locus)
Return a file of the indicated type for $locus, or '' if it can't be found. This is a find-only version of find_or_make_file, intended for use when there are multiple possibilities. [new; not used except by the ./make-ctimap-depends.pl script. -- rgr, 11-May-98.]
find_or_make_file($rule_name, $locus)
Return a file of the indicated type for $locus. This is the central recursive entry point for makefile generation -- this may call a generator function, which will in general call find_or_make_file one or more times for dependencies. The file may have already been made (or found), in which case we return it directly.
make_required_dependencies(??)
Top-level loop. Makes the required files for each locus in the @core_loci array. -- rgr, 10-Feb-98.
print_macros_and_targets(??)
To be called after make_required_dependencies (the top-level maker loop). Print targets for anything we made, dispatching through the @make_rule_target_printer array to allow for rule-specific hooks. -- rgr, 6-Jul-98.
generate_dependency_files(??)
Top level routine. This should take care of most cases.
make_abbrev_dssp_file($raw_locus, $dssp_file_name)
Make an abbreviated DSSP file in the current directory from the PDB file.
make_dssp_seg_file($raw_locus, $seg_file_name)
Generate seg file from abbreviated DSSP file. This requires the raw PDB locus (i.e. no chain ID).
make_eisenberg_fat_alanine_exposure_file($locus, $exposure_file_name)
Eisenberg "fat alanine" exposure files. Exposure files are always made for the PDB file as a whole.
make_core_file($locus, $core_file_name)
not documented
make_seq_file($locus, $sequence_file_name)
Make a sequence file from the PDB SEQRES records.
make_mrf_gmt_env_file($locus, $gmte_file)
Note that this is completely general; it does not depend on the WMS scoring scheme. -- rgr, 25-Jun-97.

rule-based-mrf.pm subroutines

Library of tools for MRF makefile makers (including the original MRF and the new VVENV environment definitions). This is not directly documented, but see http://bmerc-www.bu.edu/needle-doc/latest/depends.html for a general discussion of what this does.

This is a rule-based revision that doesn't support -use-plus.

parse_mrf_depend_arg($arg)
Parse a standard make-mrf-depends arg, which includes standard make-lib args as well.
read_xval_sets($filename)
Generates rules for nontrivial cross-validation sets, and defines the %locus_xval_set hash. This maps a locus to a non-singleton cross-validation set file name, but only for loci that belong to nontrivial cross-validation sets. Must be called after reading the core list, as it uses the %core_defined_p hash.
define_xval_set_target(??)
Define a counts file target for this cross-validation set only if there is more than one locus. Otherwise it's a singleton, which is the default anyway. [internal grinder for the read_xval_sets subroutine. -- rgr, 6-Jul-98.]
make_mrf_environment_file($locus, $env_file, $class, $keyword_arg)
Make environments.
make_mrf_singleton_env_file(??)
not documented
make_mrf_pairwise_env_file(??)
not documented
make_mrf_counts_file($locus, $cnt_file)
$core_file, $seq_file, $se_file, $pe_file
print_mrf_counts_and_total_counts($index)
Recipe for printing the counts macro and target, and for building the total counts. The $total_counts file is always built in the current directory. [This doesn't match the rest of the convention, though . . . -- rgr, 26-Jun-97.]
maybe_make_no_xval_file($rule_name, $total_counts, $token, $write_keyword)
Helper for the make_no_xval_score_files subroutine, below. Generates the NO_XVAL file for the MRF score rule $rule_name, if $rule_name files have been requested.
make_no_xval_score_files(??)
Generate all requested non-cross-validated score files from the $total_counts (the GMT scores must be per-core). The locus is always called "NO_XVAL", which is incompatible with historic naming conventions, but allows us to use the same naming logic. -- rgr, 2-Mar-99.
make_mrf_gmt_singleton_score_file($locus, $score_file, $counts_name)
Special case for GMT scores, which require environments. Note that we must be sure to use the same environments that were used to produce the counts in the first place.
make_mrf_score_file(??)
Doesn't work for GMT scores, which require environments. See the make_mrf_gmt_singleton_score_file subroutine. -- rgr, 3-Apr-98.
make_mrf_true_singleton_score_file(??)
not documented
make_mrf_pairwise_score_file(??)
not documented
make_mrf_loop_score_file(??)
not documented

score-lib.pm subroutines

Some score hacks (not part of needle-tools yet).
parse_requested_environments($arg)
We are given some combination of commas, dashes, and digits: parse it as a set of requested environments. This is a comma-separated list of subranges, where each subrange is a digit string, or two digit strings separated by dashes.
read_singleton_score_file($filename)
Read singleton scores for all environments from the named file, returning a reference to a 2-dimensional array. Relies heavily on line breaks in the standard file format. [***bug***: doesn't re-encode aa indices. -- rgr, 16-Oct-97.] [made singleton version from pairwise one. -- rgr, 12-Dec-97.]
read_pairwise_score_file($filename)
Read scores for all environments from the named file, returning a reference to a 3-dimensional array. Relies heavily on line breaks in the standard file format. [***bug***: doesn't re-encode aa indices. -- rgr, 16-Oct-97.]

seg-lib.pm subroutines

The read_seg_file subroutine.
read_seg_file($seg_file)
Read a segment file (see the http://bmerc-www.bu.edu/needle-doc/latest/ss-formats.html#seg-file-format page), initializing the @segment_foo arrays and the %pdbres_to_segment_start hash. (Presently, @segment_ss_index is only used for error messages.)

seq-lib.pm subroutines

Sequence manipulation routines.
emit_protein_sequence($locus, $sequence, $output_file, $output_format)
Emit the given locus/sequence pair in the selected format. If the output file is empty or not specified, it defaults to the standard output. The output format can be 'ig', 'fa', 'tbl', or 'just-seq'. If the output format is empty or not specified, it defaults to the value of the global $sequence_output_format variable, which is initially 'ig'.
read_sequence_file($file_name)
Read a single sequence from a file (currently must be IG format), returning (locus, sequence).
parse_seq_arg($arg)
not documented

std-pdbres.pm subroutines

Fix the pdbres field to be '####A' format.
standardize_pdbres($pdbres)
Fix the pdbres field to be '####A' format. Append a space (the insertion code) if the last character is not numeric and pad on the left to a total length of five.

vvenv-lib.pm subroutines

Standard definitions for visible-volume-based threading environments.
parse_vvenv_argument($arg)
Parse standard arguments. [This set could be made more complete. -- rgr, 16-Jul-97.]
same_sheet_p($class1, $class2)
Return 1 iff the two segments, named by their designators, are on the same sheet (or barrel). We assume that these two designators are for different segments.
encode_pairwise_environment(??)
Given a series of arguments, from least to most significant (which is the order named below), turn it into a counting environment encoding. All arguments must be 0 or 1 [except we now have to pass $cb_distance directly -- rgr, 4-Aug-97]. [This is now more or less internal to read_line_of_sight_file, except that its hairy knowledge of encoding details means that it belongs here. -- rgr, 5-Aug-98.]

The arguments are:

($total_vv_up_to_14A_res1 > $vv_exp_tr,
$total_vv_up_to_14A_res2 > $vv_exp_tr,
$cb_distance,
($vv_up_to_7_5A_res1_w1 > ($ss1 ? $vv_tr_e : $vv_tr_h)
  && $vv_up_to_7_5A_res2_w2 > ($ss2 ? $vv_tr_e : $vv_tr_h)),
$seg1, $seg2).
decode_environment($environment)
Given an $environment that is one-based, return a list of individual environment components, still encoded, with secondary structure last and most significant. -- rgr, 28-Aug-97.
env_secondary_structure(??)
Extract the encoded secondary structure value from an environment.
vector_equalp($vect2)
Testing hack. Return 1 if the two vectors (passed by reference) are numerically equal, else 0.
read_principal_component_hyperplane_definitions($real_contact_defn_file)
Read AA set hyperplane definitions, loading it into the $plane[][][] array (and initializing $n_coeffs and $flush_everything_coeffs). Sole arg is the file name.
parse_ss_codes(??)
Given $ss1_name and $ss2_name, return the right secondary structure encoding.
find_hyperplane($ss, $aa1_name, $aa2_name)
not documented
core_pdbres_to_cti_and_aa($pdbres)
Given a pdbres, return its CTI and AA letter code -- shorthand for a bunch of array and hash lookups. Only warns if not found (and only once).

vvenv-rules.pm subroutines

Create makefile dependencies for VV environments.
make_vvenv_environment_file($locus, $env_file, $class, $keyword_arg)
not documented
make_vvenv_counting_environment(??)
not documented
make_vvenv_pairwise_environment(??)
not documented
make_vvenv_edge_scores($locus, $es_file)
not documented
make_vvenv_edge_environment(??)
[***kludge***: this will make the edge scores *and* envs by side effect, without duplication if both were requested. -- rgr, 27-Mar-98.]

weighted-scores.pm subroutines

Add-ons for doing weighted scores. This is hairy because it requires two sets of counts.
make_mrf_los_counts_file($locus, $cnt_file)
Make line-of-sight counts for use in weighted scores. [really, this is almost the same as make_mrf_counts_file, except for the $singleton_xval_(core|los)_counts_files kludges at the bottom, and the fact that this version omits singletons. -- rgr, 6-Jul-98.]
make_mrf_weighted_score_file($locus, $score_file, $file_type, $maker_macro, $keyword_arg)
Requires two sets of counts, constructs cross-validated "weighted" pairwise score file. -- rgr, 6-Jul-98.
make_mrf_weighted_pairwise_score_file(??)
not documented

Scripts with subroutines

Any of these that look generally useful might be candidates for splitting out into a .pm file. Some are still in the script file because they are particular to the script's data structures and/or purposes, and some are there because of historical inertia.

aa-thresholds-to-envs.pl subroutines

Given data on stdin defining exposure bins as a function of amino acid, turn them into environments on the standard output.
exposure_bin_max_exposure($env)
Given the zero-based exposure portion of a singleton environment, compute the upper bound exposure. In order to be classed in env, exposures must satisfy
	&exposure_bin_max_exposure($env-1) <= $exp, and
	$exp < &exposure_bin_max_exposure($env).
In a sense, this is the inverse of the compute_exposure_bin function.

blast-to-clique.pl subroutines

Converts blast hits in Jim Freeman's .blast file format to Sudeshna Das's clique format. This is done by merging equivalence classes.
locus_canonical_clique($locus)
Given a locus, find its canonical clique number, if defined.
merge_cliques($locus1, $locus2)
Given two loci, put them in the same clique, merging if need be.

calculate-vv.pl subroutines

Wrapper script to calculate visible volume files. See the documentation on the http://bmerc-www.bu.edu/needle-doc/new/vvenv-tools.html#calculate-vv page.
execute_command(??)
not documented

check-pdb-seqs.pl subroutines

align_pdb_chains($chain)
Given a single chain ID, look up the atom & seqres versions in pdb_sequences, run $global_s_program_name on them.

compare-segs.pl subroutines

Compare two seg-format files named on the command line. The output to this is way too verbose; see http://bmerc-www.bu.edu/needle-doc/latest/ss-tools.html#compare-segs for gory details.
next_file1_segment(??)
not documented
next_file2_segment(??)
not documented

correct-alignment.pl subroutines

Given an alignment of two "imperfect" sequences and the correct versions of one or both sequences, produce a new alignment of the corrected sequences. See http://bmerc-www.bu.edu/needle-doc/latest/align-progs.html#correct-alignment for details.
update_alignments($locus, $seq_corr)
Given a locus and sequence, update the FSSP alignments.

dssp4.pl subroutines

Convert a DSSP file to a 'smoothed DSSP' file, using the scheme developed by Jim White and Temple Smith <dssp3.m>. Produces a variety of output formats.
  dssp4.pl [-locus name] [-chain L] [-min-strand-length slen]
	[-min-helix-length hlen] [-keep-short-strands]
	[ -t | -pdb | -ss ] [filename]
Where: filename is the name of a dssp-format file (stdin is used if not supplied).

See the http://bmerc-www.bu.edu/needle-doc/latest/ss-tools.html#dssp4 page for detailed documentation.

The -chain and -ss options are mutually exclusive.

Default output is tab-delimited residue ("pdbres"), AA, secondary structure, and exposure, one line per residue.

print_pdb_ss_record(??)
not documented

expand-dssp.pl subroutines

Expand dssp4.pl output to include the full chain.

Given 'smoothed DSSP' -t output on the standard input and the full protein sequence for a given chain, interpolate residues with missing coordinates as loop residues with empty residue numbers (pdbres fields), producing the same format on output. See the http://bmerc-www.bu.edu/needle-doc/latest/misc-tools.html#expand-dssp page for details. Operates on a single chain, which may be specified; the default chain ID is a space. For an explanation of the input format, see the http://bmerc-www.bu.edu/needle-doc/latest/dssp-progs.html#dssp4-default-output-format page.

Usage: expand-dssp.pl [-chain L] full-sequence < dssp-in > dssp-out

Where: filename is the name of an abbreviated-DSSP-format file, and the chain ID is a single letter (defaults to ' ').

fake_residue($i)
Print a line for the $i'th residue in sequence indices, assuming it is a loop.

extract-toc.pl subroutines

Find all section markup, producing .toc file format on the standard output.
standardize_whitespace(??)
Standardize whitespace of our argument. Strip leading and trailing whitespace, and turn embedded whitespace into nonredundant blanks. [should probably put this into html-hacks.pm library. -- rgr, 7-Aug-97.]

fa2tbl.pl subroutines

Convert FA files to tbl format.
print_current_sequence(??)
Print the current sequence, if any.

filter-pdb-atoms.pl subroutines

make_interesting_residue_lookup_table(@desired_chains)
Build the %residue_interesting_p table mapping a chain and PDBRES to +1 if it's a start residue, or -1 if it's an end residue. The arguments are ($chain, $start_pdbres, $end_pdbres) structures, such as would be returned by parse_chain_specification. The table values produced are the deltas in the "interestingness" of the chain at this point, except that +1 applies before this residue and -1 applies afterwards. In other words, both endpoints are inclusive, and the caller has to deal with that.

Note that the chain may be considered of interest if the following expression evaluates to true (nonzero):

	  @desired_chains == 0
		|| defined($chain_start_interesting_p{$chain});
This should be the case if we expect any of the chain's residues to pass the interesting_residue_p test. (But $chain_start_interesting_p{$chain} will be zero if all of the subranges on that chain had explicit start PDBRES values.)
print_record($record)
Given a TER or ATOM record, do the standard postprocessing steps, and print it on the standard output.
process_residue($aa_name)
If there's anything in @atoms, print the records in canonical order, checking for missing atoms, and doing the HCB thing if requested.
handle_atom_record($atom_record)
Process a new atom record, checking for new chain/residue transitions, and adding it to the @atoms array in the appropriate place.

fssp-core-corr.pl subroutines

Find correspondences between two cores using FSSP data on stdin. The correspondencies are returned (on stdout) as tab-delimited pairs of (cti1, cti2), where each such pair denotes an FSSP-approved correspondence.
make_alignment_ss_string($locus, $align)
Assumes the core file is loaded and the cti-to-alignment map is passed by reference. [this doesn't work if $use_fssp_structural_equivalence_p is true; some alignment values will the be undefined, which should never happen for core residues. -- rgr, 29-Jan-99.]

install.pl subroutines

Installation script that is smarter than the install program about (a) perl scripts and (b) programs/files that have not been changed since the last install (so that the file dates in the bin directory mean something). See the documentation on the http://bmerc-www.bu.edu/needle-doc/latest/random-tools.html#install page.
mtime(??)
Return the modification time of the file given as an argument. This not only hides the magic number, but it gets around the fact that perl doesn't like subscripting of function return values. -- rgr, 22-Oct-96.
x11_install(??)
Use cp to install a given file in a specified directory. This subroutine does not die (or croak, or exit), so that callers may clean up; it just prints a warning and increments $n_errors so that we die later instead. Based in part on the install.sh routine that comes with FSF emacs, which has the following comment:
	install - install a program, script, or datafile
	This comes from X11R5 (mit/util/scripts/install.sh).
	Copyright 1991 by the Massachusetts Institute of Technology
$program is the pathname of the thing where it lives now, $installed_program_name is its "new" name when in place, and $program_pretty_name is for use in messages.
install_perl_script(??)
Install perl script. Note that we do not want to do this for .pm (perl "module", or library) files. [though perhaps we should give those the #ifdef stuff for consistency. -- rgr, 17-Nov-98.]
install_program(??)
Have a real program to install; decide how & whether to do it.

los.pl subroutines

Make a VV line-of-sight file.
format_total_vv($value)
Maybe implement the broken rounding of total visible volume values, depending on the $use_old_format_p flag value. This is only needed for testing.
read_vv_singleton_values($file_name, $dist)
read a VV singleton file, possibly one of several computed for different radii. this is a little to peculiar to los.pl to make it worth splitting out into a library. to compute the "traditional" BMERC line-of-sight file, the user must arrange that $dist is first 0 (for 14A data) and then 1 (for 7.5A data).

make-align-depends.pl subroutines

Given a list of core/sequence pairs, generate a series of makefile rules that will produce corrected FSSP-style aligments from pima_profile format sequence matches. This is something of a kludge, as it hardwires a number of unfortunate naming conventions. -- rgr, 3-Sep-99.
make_fssp_align($core, $seq, $other_locus)
make one of these godawful things.
name_kludge_make_fssp_align($core, $seq)
[***kludge***: renaming hacks for Jadwiga's conventions. -- rgr, 3-Sep-99.]

make-core.pl subroutines

Make a .core file from PDB and seg files. Documentation is available on the http://bmerc-www.bu.edu/needle-doc/latest/atom-progs.html#make-core page. Note that the new make-domain-core.pl script (see http://bmerc-www.bu.edu/needle-doc/latest/atom-progs.html#make-domain-core) is more powerful.
read_pdb_atom_line(??)
Read the next atom line, skipping all other record types.

make-domain-core.pl subroutines

read_next_pdb_atom_residue($reset_p)
Read and return the next residue represented on PDB 'ATOM' record lines from the <PDB> file handle, skipping all other record types. May be called with a "reset" arg to indicate the file has been reopened. Updates $residue_id, $residue_chain, and $residue_chain_index globals.
generate_segs_in_range($chain, $range_pdbres_start, $range_pdbres_end, $range_name)
Given that we have read up to the first residue (in the global $residue) of the indicated chain and subrange (passed as arguments), emit all segments that fall entirely within that range, as defined by the global segment file database (%pdbres_to_segment_start, @segment_end_pdbres, etc.). Uses & updates the globals $residue and $segment_count, as well as the variables maintained by read_next_pdb_atom_residue ($residue_id, $residue_chain, etc). [this makes the modularity messy; generate_segs_in_range is therefore strictly internal to make-domain-core.pl. -- rgr, 4-Jun-98.]

make-seq-file.pl subroutines

Make a .seq file (IG format) from the PDB SEQRES records.
interesting_chain_p($chain)
Return true if the chain ID (our sole argument) is of interest. [taken from filter-pdb-atoms.pl script. -- rgr, 16-Sep-96.]
add_aa($aa_name)
Add the one letter code for the three-letter name we are given to the $sequence string.

make-toc.pl subroutines

Given a set of .toc file contents, make a hypertext table of contents using <ul> constructs. [Recoded from a 25-Jul-96 emacs-lisp implementation. -- rgr, 1-May-97.]
print_indent($to)
Dumb print-oriented version. Assumes we are starting at the right margin.
pop_levels($level)
Back out of any nesting levels we may be in at present, down to the indicated level.

pdb-domain-seq.pl subroutines

align_seq_and_atom_chains($chain)
Given a single chain ID, look up the atom & seqres versions in pdb_sequences, and use align_sequences_globally to align them, returning two (long string) values. Caches the results for the sake of subsequent calls.
extract_atom_subrange_from_seqres($chain, $start_res, $end_res)
Return the indicated subrange from the chain's SEQRES sequence, given indices in terms of the ATOM record residues. $start_res and $end_res are 0-based (with $end_res exclusive) and in terms of the atom residue numbering. (This is not as hairy as it looks; there's just a lot of index bookkeeping going on.)

pdb-to-dsm.pl subroutines

Create a "DSM" (actually, just the series of encoded states) from PDB-derived information. Assumes filtered DSSP with the -clean option, in default output format. See the documentation on the http://bmerc-www.bu.edu/needle-doc/latest/misc-tools.html#pdb-to-dsm page.
next_dssp_entry(??)
not documented
next_exposure_entry(??)
not documented

pdb-to-seq.pl subroutines

Extract a protein sequence from the PDB SEQRES or ATOM records, and output it in various formats on the standard output.
interesting_chain_p($chain)
Return true if the chain ID (our sole argument) is of interest. Uses $desired_chains, implementing 'first' and 'all' values.
emit_sequence(??)
Emit the given locus/sequence pair in the (globally) selected format & reset for the next. Does nothing if the sequence is empty, and may make a new locus if expecting to output multiple sequences.
add_aa($aa_name, $this_chain_id, $this_record_type)
Add the one-letter code for the three-letter name we are given to the global $sequence string. May emit the sequence if we start a new chain ID or record type, and updates the bookkeeping accordingly.

util-parse-dssp.pl subroutines

based on a script written by Kathleen Klose. The input is the original dssp output format from DSSP [Sander, et al. 1983].
normalize_exposure(??)
Normalize exposure to the maximum possible for a given amino acid.
pull_dssp_fields(??)
Fix the fortranish fixed length fields in the DSSP output into an array so that fields can be extracted independently. [this now returns only the ones we are interested in; I have no intention of updating the old code that extracts unused fields. -- rgr, 30-Oct-98.]

vvenv-pair-scores.pl subroutines

Hack to create pairwise scores for selected environments from a counts file. A subset of mrf-scores arguments is accepted; see http://bmerc-www.bu.edu/needle-doc/new/mrf-progs.html#mrf-scores for a description of what the args in the full set mean. -- rgr, 7-Oct-97.
compute_filtered_score_tables($pair_scores_ptr)
Given a set of scores based on counting environments, and the line-of-sight contact definitions ("hyperenvironments") in $los_envs, produce the expanded idiosyncratic score matrices on the output, as well as the "per-edge" environment file. [Sloppy with global variables. -- rgr, 14-Oct-97.]


Bob Rogers <rogers@darwin.bu.edu>
Last modified: Mon May 1 21:01:54 EDT 2000