About this document

Really, this is just random stuff I happened to throw together. Any links you make to this are liable to evaporate overnight. It is very much under construction, and is liable to be inconsistent, incoherent, and incomplete. Especially incomplete.

Table of Contents

  1. About this document
  2. Table of Contents
  3. What is needle?
  4. Installing needle
  5. Running needle
    1. Running Lisp
      1. General information about Lisp
        1. Evaluation rules
        2. Lisp packages
        3. Function definitions
        4. Descriptions of functions
      2. Setting up an init.lisp file for CMU-CL at BMERC
      3. Starting needle using CMU-CL and ilisp at BMERC
      4. The CMU-CL debugger
    2. Run parameters
    3. Setup functions
    4. Other useful functions
      1. find-available-cores
  6. Threading with needle
    1. Threading Output Fields
  7. Structure prediction using needle
    1. Overview/theory
    2. The all-sequences-do-mrf-sequence-chooses-core interface

What is needle?

In its most basic usage, needle finds the optimal alignment of a protein sequence with a core model under a given score function. [references . . . ]

Other modes of operation are available . . .

Installing needle

[should include the INSTALL.DOC material here.]

Running needle

[need some overview of cores, sequences, setups, . . .]

Running Lisp

needle is written in very "vanilla" Common Lisp, and should run under any Lisp that is minimally ANSI-compliant. We have run it under Genera (Symbolics Inc), Allegro Common Lisp (Franz Inc), and LispWorks (Harlequin Ltd.), but most of our experience is with CMU Common Lisp, a public domain Common Lisp implementation developed at Carnegie Mellon University. See the
CMU Common Lisp README file for availability and installation instructions. At BMERC, we use the `ilisp' package for emacs as an interactive interface to various Lisp systems. To download ilisp, see ftp://ftp2.cons.org/pub/languages/lisp/ilisp/ or http://www2.cons.org:8000/ftp-area/ilisp/. All of the Lisp examples below use CMU Common Lisp and ilisp.

For a more detailed online introduction to Common Lisp (that leans mildly toward Allegro Common Lisp), see Common LISP Hints by Geoffrey J. Gordon.

The definitive reference to Common Lisp is Common Lisp: the Language, by Guy L. Steele Jr., affectionately known as "CLtL" to Lisp hackers. It is available in two additions: the first is more readable, especially for novices, and the second ("CLtL2") is more complete and up-to-date (it is really just the first edition annotated with changes adopted as part of the ANSI standardization process, hence the loss of readability). The differences are mostly (but not entirely!) of interest to implementors.

General information about Lisp

This describes Common Lisp in general, and so should apply to all implementations on all systems.

Comments are introduced with the semi-colon (`;') character, and run through the end of line.

Symbols are used to name things. In Common Lisp they are case-insensitive; regardless of what case is used to spell it, the characters are internally translated to uppercase when read in. Consequently, symbols print in all uppercase. A wide variety of characters is accepted in names, but `-' is conventional for multiword names, as in restore-run-parameters-from-err-file, rather than the underscore (or `_') that is the only option in most computer languages.

Strings are delimited by double-quotes (`"' characters) and case is preserved. needle uses strings extensively for core and sequence names; since these are turned into file names (sometimes called "pathnames" in Lisp) and Unix is case-sensitive, one must be careful to type the case of cores and sequences consistently.

Decimal numbers such as integers and floats are written much as they are in any other language, except that the syntax for floats requires there be at least one digit after the decimal place; "79." is read as the integer 79. (We'll leave off non-decimal syntax, not to mention exotica such as complex and rational numbers).

Lists are typed between parentheses, such as ("2mhr" "2hpr" "1apa"), and can contain any other Lisp object, including another list. The symbol nil denotes the empty list -- it is the only symbol that is valid where a list is required. List constants need to be quoted in expressions in the Lisp Listener since they would otherwise be interpreted as forms (function calls).

Symbolic expressions or s-expressions include all of the above data objects (which excludes comments, of course) and their recursive composition into lists and arrays. An s-expression that is to be evaluated is sometimes called a form.

[may want to describe #+foo syntax. -- rgr, 27-Feb-96.]

Lisp variables are named by symbols, not surprisingly. By convention, global variables in Lisp start and end with "*", e.g. *loop-scores-p*. As shown in the next example, the variable * contains the result of the last expression evaluation.

Function calls are denoted in list syntax, with the function name as the first element of the list, followed by the arguments. Note that the function name is not evaluated. For example, one can use Lisp as a desk calculator:

 
* (+ 3 (* 5 79.0))
398.0
* (sqrt *)
19.949938
*
 
Note that "*" appears as a prompt, a symbol naming a function, and a symbol naming a variable in this example.

The truth value "false" is denoted in Lisp by the symbol nil (the same symbol that means the empty list). All other values are "true", though an explicit value of true is usually denoted by the symbol t. By convention, names of variables (such as the aforementioned *loop-scores-p*), parameters (such as :show-alignment-p), and functions (such as valid-aa-index-p) that end in "-P" are predicates (i.e. truth-valued).

In Lisp systems generally, invoking Lisp puts you in what is known as a "read-eval-print" loop, where Lisp expressions (code fragments) are read from the terminal, evaluated (executed), and the resulting values are printed. The conceptual entity that is doing the read-eval-print loop is sometimes called a "Lisp Listener." For CMU-CL specifically, the Lisp Listener prompt is an asterisk.

Evaluation rules

This section describes how Lisp evaluates expressions; it summarizes the core of the language semantics. It is not necessary to understand this in detail to get started, but if you want guidelines for modifying any of the examples below to suit your needs, or need to figure out why you got that "unbound variable" error, these rules should help.

Evaluation is implemented by eval, which is a recursive function. Given an expression to evaluate, eval first classifies its argument according to type:

Some exceptions to the normal evaluation rules:

Lisp packages

A Lisp "package" is simply a way of managing names so that several systems can be loaded at the same time without stepping on each other's definitions.

If functions and variables seem mysteriously to not be defined, you should try to put yourself in the needle package via (in-package needle).

For more details, see Chapter 11 of CLtL2.

Function definitions

Functions are defined with defun, as mentioned above. The syntax is as follows:
 
    (defun function-name parameter-list
      [ "documentation string" ]
      [ (declare ...) ]
      expression...)
 
where elements in brackets are optional. The parameter-list is a list that describes the formal parameters to the function. (Officially, this is called a lambda-list in Lisp, and is defined more fully on
page 76 of CLtL2. For brevity, we omit some options and descriptions of semantic details.) The parameter list consists of:
  1. an open parenthesis;
  2. zero or more symbols naming the required positional parameters;
  3. optionally followed by the symbol &optional and one or more optional parameters (which are also positional), each of which can be
    1. a symbol (in which case the default is NIL), or
    2. a list of "(symbol default-expression)", in which case the default-expression is evaluated if a value is not supplied;
  4. optionally followed by the symbol &rest and a single symbol, the rest arg, into which any remaining parameters are gathered into a list;
  5. optionally followed by the symbol &key and zero or more keyword parameter specifications, each of which must be
    1. a symbol (in which case the default is NIL), or
    2. a list of "(symbol default-expression)", in which case the default-expression is evaluated if a value is not supplied,
    optionally followed by &allow-other-keys;
  6. and finally a close parenthesis.
When the function is called, required and optional arguments are processed left-to-right, being assigned to formal parameters in the obvious way. The remaining arguments, if any, are used for &key and &rest parameters; it is an error if neither is specified in the parameter list, but there are arguments left over at this point. The difference between optional and keyword arguments is that for keyword arguments, the remaining arguments are expected to be alternating keyword symbols and values; formal parameters are assigned argument values by looking for a keyword of the same name as the variable. (Note that if both &key and &rest are specified, then the same list of actual arguments is used for both.)

Since Lisp is a dynamically typed language, parameter type declarations are optional; if a given variable is not declared, it may hold a value of any type (though that does not necessarily mean that any value will be valid).

To evaluate the function call, the actual arguments are bound to the formal parameters sequentially in the manner described above, and the body is evaluated, returning a result to the caller. If more than one expression appears in the body, the result of the function call is the result of the last expression.

For example, suppose the following function was defined:

    (defun frobulate (foo bar &optional baz (quux 2)
		      &rest args
		      &key orf (glorf (* quux 8)) (blorf orf))
      (list foo bar baz quux (length args) orf glorf blorf))
The minimum number of args that can be passed to this function is 2, bound to foo and bar. The next two arguments, if supplied, will be bound to the optional variables baz and quux, which will default to nil and t, respectively, if not passed. Any remaining args (which could be an arbitrarily large number) are both bound as a list to args and also used to extract values for orf, glorf, and blorf using the keyword symbols :orf, :glorf, and :blorf, respectively.

If we call our (rather contrived) example with various sets of arguments, here is what the results would look like:

    * (frobulate 1 2)
    ;; This shows how all of the defaults look.
    (1 2 NIL 2 0 NIL 16 NIL)
    * (frobulate 1 2 'a 8 :orf 1)
    (1 2 A 8 2 1 64 1)
    ;; Note that in this next example, quux is not a number,
    ;; so we must supply a value for :glorf to avoid evaluating
    ;; its default.
    * (frobulate "string1" "string2" 'a 'b :glorf 77 :orf 9)
    ("string1" "string2" A B 4 9 77 9)
    * 

Descriptions of functions

In this documentation, functions are described by an abbreviated parameter list, followed by a detailed description of what the parameters mean. (This looks much like a function definition, but with the defaults described later with the arguments.) Not all parameters may be documented. In fact, it is common for an &rest list to be used for internal purposes; in such cases, the internal parameters will be omitted from the function description. (One should not rely on using undocumented parameters.)

Here is the frobulate example documented according to this style:

frobulate (foo bar &optional baz quux &key orf glorf blorf)

The frobulate function does the following useful thing (summarized briefly) . . . and takes the following parameters:

Here is a more detailed description of frobulate, including output formats, and information about how parameters interact that doesn't fit into the tabular form above . . .

Setting up an init.lisp file for CMU-CL at BMERC

All users who wish to run Lisp on CMU-CL at BMERC should have a file called "init.lisp" in their home directory. CMU-CL loads this file during startup, and it is convenient to have this file define standard setup functions and load patches. Since all of this is done by the init.lisp file in the thread account, all that is really necessary is to load that file:
 
;; Standard init.lisp file.
(load "/huxley3/users/thread/init")
 
This is how the needle-load and needle-dev-load functions gets defined.

Starting needle using CMU-CL and ilisp at BMERC

  1. Start emacs.
  2. Use the "M-x cmulisp" command to start a CMU-CL process running in a buffer of its own. The buffer is displayed in another window, where you will see its startup dialog, followed by the "*" prompt.
  3. Load needle using (needle-load) to get the current released version, or (needle-dev-load) to get the development version (not recommended unless you've been told you need it).
  4. Put yourself in the needle package via (in-package needle).
  5. Select a setup function for doing subsequent analysis, e.g. (set-up-for-true-mrf).

To leave Lisp, invoke the (quit) function.

The CMU-CL debugger

When a Lisp system encounters an error that is not explicitly handled by the program, it enters an interactive debugger to allow the user to figure out what happened and to fix it. (Since debuggers and their interfaces are not covered by the Common Lisp standard, they tend to vary from system to system, especially in detail.) The error could be fairly trivial, or something as severe as a segmentation violation (memory exception). Often it's just a simple typing error, as in the example below, in which case the appropriate response is :a to abort back to the command level.

A common class of error is mistyping a function or variable name. Here is what CMU Lisp does with such a mistake.

 
   * (set-up-for-foo-mrf)
   Warning: This function is undefined:
     SET-UP-FOR-FOO-MRF

   Error in KERNEL:%COERCE-TO-FUNCTION:  the function
	 SET-UP-FOR-FOO-MRF is undefined.

   Restarts:
     0: [ABORT] Return to Top-Level.

   Debug  (type H for help)

   (KERNEL:%COERCE-TO-FUNCTION SET-UP-FOR-FOO-MRF)
   0]
 
The initial warning is a result of a source code analysis step, comparable to the initial phase of compilation, that the CMU evaluator does to expressions before evaluating them. The other elements of the message are generated by the debugger, as follows:
  1. The error message names the function that detected the error, followed by the message string (that describes its notion of what went wrong).
  2. Restart options are presented, in this case only one, that can be invoked with the :restart command. The "[ABORT]" before the error description means that :restart 0 is equivalent to :a; both abort the erring expression and return to command level.
  3. The debugger announces itself (and explains how to get more info on-line).
  4. The current stack frame where the error was detected is shown. This can get very terse depending upon the compiler options.
  5. Finally, the debugger prompt is the line with a number followed by one or more close bracket (`]') characters. The number indicates the number of stack frames below the erring frame, and the count of close brackets tells how deeply nested the debugger is (in case you encounter an error while within the debugger).
Only a few commands are of interest to the casual user; see the online documentation for more details.

For more information, see the "Debugger" section of the CMU Common Lisp User's Manual, which is available online at BMERC in the emacs info browser (type C-h i to get into info, then m cmu RET to get to the CMU manual).

Run parameters

In its operation, needle is controlled by a set of run parameters, which are implemented in Lisp as global variables. To get a list of their current values, invoke (display-run-parameters) at the Lisp prompt. To reset all parameters to their default values, do (reset-all-run-parameters).

For detailed information on specific run parameters, see the needle-run-parameters section.

Setup functions

The functions named something like set-up-for-... are used for establishing sets of related run parameters, such as are used for a given scoring scheme . . .

Other useful functions

find-available-cores

After establishing a setup, you can use the find-available-cores function to list the cores which can be threaded under that setup . . .

Threading with needle

Before doing a threading run, one must initialize needle
as described above. Then, one may thread in any of the following ways: All such threading methods leave the variables *sm* and *qm* bound to the search manager and queue manager respectively, and may be used to explore the results further.

Threading Output Fields

[This also includes a certain amount of definition of terms, which could profitably be separated out.]

Structure prediction using needle

Given a the set of files output by timing-kludge-cross-predict-list, all-sequences-do-mrf-sequence-chooses-core may be used to generate a structure prediction based on MRF sampling theory [credits to Jim White & Rick? -- rgr, 27-Feb-96.].

[need much more overview/theory here. -- rgr, 27-Feb-96.]

Overview/theory

[Structure prediction, also known as "the seq-choose-core stuff", is considered new and highly experimental code within needle, so is not well documented, nor is the user interface polished to any degree. -- rgr, 5-Mar-96.]

Basically, Jim White's seq-choose-core MRF theory has a numerator that sums scores across all possible threadings, and a denominator that sums scores across all possible sequences. When pairwise scores are used, both of these sums are NP-hard. When only singleton scores are used, a recursive analytic formula yielding the exact value is possible (hence, "analytic" and "nopair").

Additionally, when only singleton scores are used, the mean and standard deviation over threadings and sequences have only a linear and quadratic complexity; they have a quadratic and quartic complexity when pairwise scores are used. The quartic complexity is prohibitive, but in the nopair case the quadratic complexity is fast. Consequently, we can use them to compute the mean and standard deviation and integrate this in log-normal fashion.

In the top/seq-choose-core.lisp source file, the scc-log-unnormalized-prob-of-model-given-sequence function (scc) computes the numerator, and the scc-log-unnormalized-prob-adjust-by-run-set-up-name function (scc log-numerator) computes the denominator.

When *seq-choose-core-comparison* is :nopair-analytic-partition-fcn, the numerator is computed from the nopair analytic formula by the function log-nopair-partition-function in ext/nopair-analytic.lisp.

When *seq-choose-core-comparison-denominator* is :nopair-analytic-normalizing-constant, the denominator is computed from the nopair analytic formula by the function log-nopair-normalizing-constant in ext/nopair-analytic.lisp.

When *seq-choose-core-comparison-denominator* is :analytic-integrated-log-normal, the denominator is computed from the analytic mean and standard deviation by analytic integration of the log-normal distribution.

[There is a corresponding nopair analytic integrated log-normal planned for the numerator, but I haven't had time yet to put it in yet. -- rickl, 5-Mar-96.]

The all-sequences-do-mrf-sequence-chooses-core interface

all-sequences-do-mrf-sequence-chooses-core is a function that takes a number of keyword arguments. [finish. -- rgr, 27-Feb-96.] Additionally, there are a few relevant global variables (which eventually should become arguments to the call instead of globals).
Bob Rogers <rogers@darwin.bu.edu>
BioMolecular Engineering Research Center
Boston University
36 Cummington St
Boston MA 02215
Last modified: Tue Jan 18 21:38:36 EST 2000