Why would users at BMERC who are not involved in cross-threading jobs want to know about this server? Because it offers the general user a way of suspending a running cross-threading job temporarily. While the job is suspended, one can get real work done without competing with Lisp for memory and CPU time. (Believe me, you would find it hard to compete with Lisp on that score.) You don't need to know the thread password, and we don't lose time or data (or not much, anyway).
Note that each host may have at most one "current" job. All other
jobs assigned to the host will be idle (or something equivalent); they
cannot be running or suspended. [This restriction may eventually be
lifted, but it doesn't really make sense to try to run more than one
Lisp job at a time, even on a multiprocessor box. Lisp jobs are
intensely memory-hungry. -- rgr, 31-Jan-96.]
More than one user may have a suspension request in effect at a
time; the job will resume when the last request expires (or is revoked).
(You could say that all sentences are served concurrently.) Requests
are kept track of by email address. The rationale behind maintaining
requests from multiple users is that one should not have to know whether
one is the last person to leave before resuming. If other people are
using that machine, then let them suspend it themselves.
15 minutes before a suspension request expires, you will receive a
notice by email to that effect, reminding you to renew it if you so
desire. It is better for all concerned to renew before the end of the
suspension, so that the cross-threading job is not fruitlessly restarted
and users are not pointlessly annoyed by being thrust into the
background by a thrashing, paging behemoth.
Since suspensions apply to the host, and not the job, it is possible
to reassign that job to another host, and even to assign another job in
its place, during a suspension. Such operations do not affect the
status or duration of suspension requests. In particular, starting a
job on a host with pending suspensions sends the state directly from
idle to suspended.
Despite the automatic expiration feature, if you should happen to
leave earlier than you expect, we would be grateful if you would send
an explicit resume request to start the cross-threading job up again.
A word to the wise: The server logs all transactions, so
anti-social uses of the server are readily detectable. We know where
your office is.
The M-x ctserv-request emacs command prompts for one or more
requests on a single line in the minibuffer. Requests are of the form
"verb noun" (though one could argue that "status" is not a verb).
Any number of requests may be submitted at once, separated by
semicolons.
For each invocation of any of the commands described above, a
response message is generated and sent by email. Each individual
request within the mail message has its response delimited by
"*** verb noun" lines.
NB: No request should take more than a few minutes to answer.
If you do not get a response to your request after this time, it
probably means that the server has crashed. Please send mail to Bob Rogers
<rogers@darwin.bu.edu> to get it restarted (and please
do not flood the queue with messages!).
The following requests are supported:
The suspend and resume requests always operate on
hosts, though they may affect the hosts' current jobs. The start
and stop requests operate on jobs, except that if a host is
specified, the job is determined implicitly.
Stopping a job is simpler. If a host is specified, then its current
job is clearly meant. [If I ever extend this to handle more than one
job on a host at a time, then stopping a host will mean stop
all jobs on that host. -- rgr, 1-Feb-96.]
When a job is explicitly stopped, ctserv does not try to start the
next job on the job queue.
The set of standard user-defined option identifiers for job files is
described below. Most of them have reasonable defaults. Other options
may be included as well, but are ignored by the server. They may be
used in substitution forms for scripts and
reply messages. [Unfortunately, spelling errors in standard option
names go undetected. -- rgr, 25-Jan-96.]
See also the set of mail message headers [ref? -- rgr, 6-Feb-96].
These may be used in constructing replies.
Suspending and resuming
A suspend request for a given host causes that host's current threading
job (if any) to be stopped temporarily. This period has been
defined by executive fiat as three hours; there is no way to request a
different duration, though one can revoke a suspension before it
expires by sending a resume request. After this period, the job is
automatically restarted. The rationale behind automatic expiration
after a fixed period is that it prevents the "I forgot to restart it on
my way out the door" error, which can cause a whole night (or weekend)
worth of work to be lost. Three hours is a compromise between requiring
requests to be sent too frequently, and losing many hours of potential
computation if the interval is set too long.
Sending requests to the server
There are several ways of submitting requests to ctserv:
Configuring ctserv
This section is for those users in charge of keeping a cross-threading
run going, and who therefore need to create and modify job files.
Security issues
These are summarized simply:
Directory structure
[to be filled in. -- rgr, 26-Jan-96.]
Starting and stopping jobs
The start request accepts a job or host name, or both, but it
must have at least one, and it must be able to find the other from
whichever argument it is given. (Both files are checked to see if a new
version has appeared on disk, and are reverted as early as possible in
the startup sequence so as to reflect their new contents.)
Once ctserv has a job and host, the following conditions must be
met in order for the request to succeed:
If any suspension requests are in effect for the host, the job does not
start right then, but transitions to the suspended state, from which
it will be started subsequently when all suspensions expire.
Host file format
Host files must live in the host directory (currently
~thread/code/ctserv/hosts/) and have file names of the form
"hostname.host". They look suspiciously like email
messages; there is a header section, followed by a body (the body is
ignored). There can be no blank lines before or within the
header. Headers may appear in any order, and are of the form
"identifier: value", but (in accordance with RFC822) the option
identifier can include any character other than whitespace or colon, and
the value can be split over several lines provided that the continuation
lines are indented (tab is conventional) and are not entirely blank.
The identifier must start in column one; there can be no leading
whitespace. The resulting values are all stored internally as
strings (unless otherwise converted into something else), and have all
line breaks removed (i.e. the newline and leading whitespace is turned
into a single blank). This means that you are likely to get bizarre
results if you try to include comments anywhere; there is no way to include comments in the
options, alas.
[finish. -- rgr, 31-Jan-96.]
Standard user-defined host options
These are supplied by the user in the host file. [We may want to add
some standard options to describe Unix dependencies, such as which lisp
to run, and to define "policy", such as for restricting users who can
submit certain requests concerning that host. -- rgr, 6-Feb-96.]
System-defined host options
These can be used in substitution forms,
with the limitations mentioned. ctserv defines these when needed;
attempting to define them in the host file could have unexpected
consequences.
Job file format
Job files must live in the job directory (currently
~thread/code/ctserv/jobs/), have file names of the form
"job-name.job", and are in the same format as host
files.
Standard user-defined job options
Options defined by convention
These are not touched by the server (and will remain undefined if not
specified in the job file). Their documentation here is for the purpose
of establishing a convention.
System-defined job options
These can be used in substitution forms,
with the limitations mentioned. ctserv defines these when needed;
attempting to define them in the job file could have unexpected
consequences.
If there is already a job running on the targeted host, you should edit that host file, changing its jobs option so that the new job is enqueued (either by putting the job name at the end of the list, or after the currently running job). The server will notice that the host file has changed on disk, and will revert it before checking to see what the next job should be.
Note that, through the magic of substitution
forms, you can reuse the same script template for a series of jobs,
using the options to customize the template.
Note that not all state
transitions are instantaneous. Transitions that record the consequences
of user actions (like running -> suspending) are effectively
instantaneous, in the sense that an immediate status query (included as
part of the same request) will show the change. Transitions that happen
as a consequence of external actions (principally the suspending ->
suspended transition, which happens when the process exits) will not be
seen until after the message is processed, and so will appear to happen
later. One cannot resume from the
suspending state, so trying to suspend and resume in the same
request (pointless, but theoretically legal) will not work. The user
gets a
A substitution form can be any of the following:
Many conditional expressions are of the form
When inserting buffers and files, use psa-insert-buffer and
psa-insert-file rather than insert-buffer and
insert-file; the psa versions leave point at the end of the
insertion, where you want it.
1. They must do something useful with the enable-file option in
order to suspend when required to.
2. They must output to the file named by the output-file option, so
that post-mortem analyses of unexpected exits can use this output to try
to figure out what happened.
3. They must be csh scripts (there is a kludge that uses the
"/bin/csh -f script-file-name" to find the process id).
[more stuff . . . -- rgr, 4-Mar-96.]
An important thing to remember (at least for CMU Common Lisp) is that
the Lisp process' stderr must be redirected somewhere (to
/dev/null if not to a file). If not, then Lisp gets a SIGPIPE error
when (e.g.) the warn function tries to write to the stderr that got
closed when the server exited.
At present, all jobs that do not actually have a process running
come up in the idle state. It should treat those jobs that still have
enable files as if they had randomly exited, and implicitly restart
those that can be started. But the present behavior is user-friendly.
The server is implemented using psa-request technology, and uses code
in the ~psa/bin/ directory. See the ~psa/bin/README file for more
details on psa-request server maintenance.
The job state machine (detailed)
[Ought to have a graphic here. -- rgr, 25-Jan-96.]
The foo job on sewall is suspending, and cannot be resumed
message. Such request combinations therefore make the state machine
details visible, which unfortunately dilutes the point of showing the simplified state machine.
Substitution forms
An important aspect of the psa-server technology is the ability to
create customized output files from a template or form
file. In the form, all "boilerplate" entries as plain text,
interspersed with substitution forms that are evaluated as
emacs Lisp expressions and then inserted (depending on the form and
the result, as detailed below). Substitution forms are of the format
$(foo)
The entire form including the dollar sign is deleted, plus the following
newline if the resulting line is emptly, and then the form is evaluated
according to the rules below.
[finish: psa-value, ctserv-value, etc. -- rgr, 31-Jan-96.]
$(if (condition)
(insert "something or other")
(delete-char -1))
The delete-char is because when the form is deleted, it
leaves an empty line (assuming the $ is in the first column). Thus,
if (condition) is true, the insertion leaves "something or
other" on a line of its own. The usual intent in this case is leave
out the "something or other" line altogether, so (delete-char
-1) joins it to the previous line.
[This is automatically handled in the case of comments, and should be
generalized. But I have shied away from doing so for compatibility
reasons. -- rgr, 31-Jan-96.]
Script files and Lisp
Scripts run under ctserv are expected to do the following things:
Maintaining ctserv
This section is intended for emergencies, i.e. the server goes haywire
and I'm not around to maintain it. If you don't need to know this, you
don't want to read this.
Starting and stopping ctserv
The server normally runs on gamow. To start it, invoke
"start-server ~/code/ctserv" on gamow (or the machine on
which you wish the server itself to run). To stop it, invoke
"stop-server ~/code/ctserv" (on any machine). You must be
logged in to thread to do either of these (but "su thread" also
works). Also, the full path (without a trailing slash!) must be given
in the "server" argument to these scripts; defaulting from the current
directory will not do (really, this is a psa-request bug). The scripts
may be more reliable if invoked on a Sparcstation; I haven't tested them
on the Alphas.
Known ctserv bugs
Bugs that are still current are marked by "***"; other bugs are kept for
historic reasons.
Bob Rogers
<rogers@darwin.bu.edu>
BioMolecular Engineering Research Center
Boston University
36 Cummington St
Boston MA 02215
Last modified: Fri Nov 7 15:05:07 EST 1997