The serv-comp sub-package
This sub-package has the bulk of the cluster-manager
functionality. It has the following files:
Perl scripts
- gen-services.pl: Script to generate a random service
configuration file.
- num-edges.pl: Script to compute the number of edges in a
graph -- input in the linked-list format described below.
- cdf.pl: Script to work out the CDF of a set of values,
each given on a line in a file.
- avg_se.pl: Script to compute the average and standard
error of a set of values, each given on a line in a file.
- get-free-mill-nodes.pl: Script to get a list of currently
least loaded Millennium machines.
- print-nums.pl: Used by get-free-mill-nodes.pl
- col-merge.pl: Used by get-free-mill-nodes.pl
- ssh1MillNodes.pl: Script to test "ssh" into the
Millennium nodes selected for running the cluster-manager
software.
- startMillNodes.pl: Script to run "startServComp" with
appropriate arguments, on all the Millennium nodes.
Service composition software
- ServComp.cpp: This implements the Dijkstra's like
algorithm for service-level path construction.
- TreeInfo.cpp: The set of information required for
service-level path caching, after the graph computation in
ServComp.cpp
- Bunch.cpp: Data structure to handle the recovery of a
bunch of paths that pass through the same overlay link.
- ServCompIF.cpp: This is probably the most important
file. This implements the interfaces of the cluster-manager
with:
- Peer cluster-managers
- Client machines that send path requests
- Services that run in the cluster
These interfaces are described below.
- SCRecovery.cpp: This is another very important file. It
implements the path recovery algorithms. It has code for
end-to-end recovery, as well as local recovery. But, only the
former has been tested properly.
- Paths.cpp: Data structures to keep track of paths at the
cluster-manager.
- ServCompIFPkt.cpp: This is the structure that represents
the packets exchanged with the cluster-manager.
- startServComp.cpp: This is the main routine that calls
other routines from other files, including those above.
- MakeConnected.cpp: Prints out a connected version of a
possibly unconnected graph -- input and output graphs are in
linked-list format, described below.
- sgb2gml.cpp: Program to convert a graph in stanford
graph-base format to GML format, to be drawn using the GML
package.
- sgb2ll.cpp: Converts a graph in stanford graph-base
format to one in linked-list format.
- ParseLog.cpp: Program to parse the log files written out
by the cluster-managers. When taking input from STDIN, this
produces 5 output files:
- alt-done.out: these are the times taken for alternate path
creation.
- redir.out: these are the times taken for local redirection
based recovery.
- alt-norm.out: these are the times-to-recovery, normalized
to path length.
- paths.out: these are the specifications of the paths
created during this run; this is what figures as
"paths.cfg" in later runs of the cluster-manager software
(see under "configuration files" below).
- pcount.out: list of edges in the graph, and how many paths
each had running through them.
Configuration files
These are the different configuration files:
- paths-*.out, paths.cfg: paths-*.out are
the 'paths.out' output of the ParseLog program, on the log files
of the cluster-managers. paths.cfg is the file read by the
cluster-manager, when the cfg_read_paths flag is set to be
true. paths.cfg is usually a soft-link to one of the
paths-*.out. When cfg_read_paths flag is set false, paths.cfg
need not exist.
Format: These files have the specification for a
particular client path on each line. The list of logical
services, list of service instance locations, and other info are
given.
- graphs/*.sgb: These are some example
graphs in Stanford graph-base format -- these are used to
generate the overlay graphs for configuring the overlay. These
files are not actually directly read by the cluster-manager
software, or other programs -- these have to be converted to the
linked-list format using the sgb2ll program.
Format: These files are in Stanford graph-base
format -- refer to the documentation with the download for more
information.
- cfg/graph*.cfg: These are the result of
running sgb2ll on the graphs/*.sgb files. These are in
linked-list format, and this is the format relevant to our
programs. Many program expect these graph file names as command
line arguments. (In this implementation, the overlay network
graph is hence not dynamic -- although links can go down and
come up).
Format: The linked-list format has a line for each
node in the graph. The node name is the same as its SCID in the
overlay graph. Each line starts with the SCID of the node.
Then it has pairs of values, each representing an arc coming
into the node (this is a directed graph). The first in the pair
is the neighbour node's SCID, and the second in the pair is the
(one-way) cost of that edge.
- cfgs/*services*.cfg: These are the
files describing the location of services in the system (service
location is hence not dynamic). These files are to be given as
command line arguments for some programs.
Format: The file has four columns in each line. Each
line describes the location of one service. The first column
gives the service name (service description in our
implementation is just a string). The second gives the SCID of
the cluster where the service is supposed to reside. The third
and fourth columns give the IP-Address:Port of where the service
is listening for interface packets from the cluster-manager.
- cfgs/*scid_mapping*.cfg: These are the
SCID-Mapping files -- also given as arguments for several
programs. An SCID-Mapping file defines which machines are
representing the cluster-managers of which service-clusters.
Format: Each line specifies a single cluster-manager.
The first column gives the SCID of the service-cluster, and the
second column gives the IP-Address of the cluster-manager.
Running the software
In general, all programs give the usage, when run without any
arguments. Here are the important ones:
- startMillNodes.pl: This starts up the cluster-manager on
a bunch of machines -- all the machines of the overlay network.
It assumes that you have "ssh" access to these machines. (If
you setup "ssh" passphrase, you don't have to type any password
at all). The list of these machines is taken from the
SCID-Mapping command-line argument. The overlay graph file, in
linked-list format, and also the service-location configuration
file.
- ssh1MillNodes.pl: This program tests the "ssh" on the
list of machines given in the SCID-Mapping file. It prints the
current load on all these machines, with a "-u" option. And
with a "-k" option, it kills any "startServComp" programs that
were started earlier using the "startMillNodes.pl" program.
- startServComp: Use this if you want to start up the
cluster-manager software individually on machines. The
arguments it requires are self-explanatory.
NOTE: Each cluster-manager writes out its log file in
a directory whose name is the same as its SCID. For instance, a
cluster-manager whose SCID is "1" will write out its log to the
sub-directory names "1", of its current directory. This
sub-directory has to be created before running the
cluster-manager.
Interface specifications
This is probably the most important part if you are trying to add a
service, or a new client to the software package. The interface with
the client, as well as the service instances are implemented on top of
the UDP library in the udp-lib sub-package.
The interface between the cluster-manager and a client
A path is identified by a tuple: "pathID:path_dest_scid". The
path_dest_scid is the SCID of the destination overlay node of the
service-level path. The client makes its path construction request to
its nearest overlay node's cluster-manager. And this overlay node
forms the path's destination.
Ideally, the pathID should be chosen by the path_dest_scid
cluster-manager. But, in implementation, it was just easier to have
the client choose it -- it was easier to detect duplicates (see the
"at-least once" semantics in the udp-lib
sub-package documentation).
A negative consequence of this is that the clients have to choose
unique pathID's during the lifetime of a run of the software. This
can be difficult if there are multiple clients. But since this
software was written for a design study and evaluation, and not for
production operation, I considered this kludge to be manageable.
The client specifies the following in its request packet to the
cluster-manager:
- The pathID -- this is actually specified as the app_seq field in
the ServCompIFPkt structure -- it is best to copy the same value
onto both the pathID and app_seq fields, in the request packet
being sent.
- The path_origin_scid -- this argument is required if the source
of the data stream is outside the overlay node. In this case,
the path_origin_scid is the SCID of the overlay node closest to
the data source -- and this has to be learnt by the client by
some means before-hand (may be an initial handshake outside of
our middleware software, between the client and the data
source).
- The specification of the client's destination. This is
application specific, and is given as a string in the
PREV_DEST_SPEC_F key-value pair of the ServCompIFPkt. This
string's format is understood by the next service in the set of
services being composed.
An example is the case of RTP streaming. The client gives an
"IPAddr:port" string to the cluster-manager (to be passed on to
the next service). This specifies the client's IP-address, and
the port on which it is listening to receive the RTP packets.
- The logical list of service to be composed. As mentioned here, in our operational model, the portal
provider decides to compose a set of services beforehand. The
client knows about this, and communicates it to the overlay
middleware software, so that the latter can choose the
appropriate set of service instances. The list of services is
given in the SERVICES_F key-value pair of the ServCompIFPkt
structure. It is a string -- with spaces between the different
services to be composed. The order is -- downstream to
upstream.
An example is the case of the text-to-audio composed service
-- this has the email service, and then the text-to-speech
service. The SERVICES_F string for this looks like: "tts email"
(downstream to upstream of data flow).
In the reply from the cluster-manager to the client, the field that
is relevant is the "success" field of the ServCompIFPkt structure.
This indicates success with a value of 1, and failure with a value of
0.
The interface between the cluster-manager and a service
instance
During path construction, the cluster-manager communicates with
service instances in its cluster -- to "setup" the path for the client
session. The cluster-manager sends a request over the at-least once
UDP layer of the udp-lib sub-package, and
the service instance responds to it. Note that the port where the
service instance is listening for the request, is given in the
"cfgs/*services*.cfg" configuration files -- whatever is given as
argument to the cluster-managers when they were started up.
The cluster-manager specifies the following information in the
request packet:
In its reply, the service instance specifies the following to the
cluster-manager:
- The PREV_DEST_SPEC_F string -- this specifies to the
cluster-manager as to where this service instance for this
client path session is listening for data packets. This string
is to be used as argument in the upstream cluster-manager's
request to the upstream service instance in the path session.
- The "success" field of the ServCompIFPkt structure indicating
the success of service instance creation for this particular
client path session.
This is so far as the path creation interface goes. The interface
also includes ways to redirect the service instance to send data to an
alternate downstream location, and to kill the service instance.
The APP_CHANGE_SERV_INST ServCompIFPkt type is used by the
cluster-manager to ask the service instance to switch its downstream
node. The cluster-manager specifies a new downstream service instance
to direct data to. This is actually used during local recovery, and
not during end-to-end recovery.
The APP_KILL_SERV_INST ServCompIFPkt type is used to terminate the
particular client session in the service instance. This is not
actually used as far as I know. The client session is terminated
using an application level signaling. Also, theer is an
application-level soft-state refresh, which if times out, can trigger
the service instance to kill the client session.
The path construction process
This is best explained with an example -- we choose the
text-to-speech composed service. The following picture shows the
signaling that happens for path creation.
- Step 1: The client sends its request to the cluster-manager,
along with the specification of where it is listening for RTP
audio packets.
- Step 2: The cluster-manager becomes the destination overlay node
for this client path session. It figures out which service
instances to use for this session. It turns out that it has to
implement a no-op service within its own cluster. It sets this
up in this step. The no-op service is given the client's
IP-addr:port to which the no-op service has to forward packets
from upstream. The no-op service tells the IP-addr:port on
which it is listening, for this particular client session, in
the response to the cluster manager.
- Step 3: The cluster-manager passes on the information to the
upstream node.
- Step 4: The upstream node cluster-manager repeats Step 2, but
for a text-to-speech service instance in its cluster.
- Step 5: Repeat of Step 3.
- Step 6: Repeat of Step 2, but with another no-op service.
- Step 7: Repeat of Step 3.
- Step 8: Repeat of Step 2, but with the text-source service.
- Step 9: Cluster-manager returns success to downstream
cluster-manager node.
- Step 10: Repeat of Step 9.
- Step 11: Repeat of Step 9.
- Step 12: Cluster-manager returns success to client.
At this point, application-level communication begins. For the
text-to-speech service, this consists of several data exchanges, shown
the picture below. Communication starts with the upstream node
sending information about itself to the next downstream node. This is
part of the generic_client_info of the app
sub-package. The no-op nodes participate in this. All other
communication is not noticed by the no-op services -- they simply
forward the data either way.
- The request-response protocol: An application specific
request-response protocol starts the data flow. In this case,
the client sends a request for a particular file to be read out,
starting from a particular sentence number, and byte position in
the audio stream. The text-to-audio service retrieves one
sentence after another, and processes it.
- The data flow: This happens using an application specific
protocol as well. In this case, it is the response to the
request sent upstream.
- Keep-alive soft-state refresh: This is sent periodically. On a
timeout, service instances can stop the particular client
session.
- Application soft-state: This is sent periodically downstream,
and consists of the state required to restart the client session
at a new set of service instances. For this application, this
consists of the current sentence being streamed, as well as the
byte position in the autio form of the current sentence.
The path recovery process
Each path is identified by a version number -- this starts from
100, and is incremented by 100 in each step. We have implemented
end-to-end recovery. Here, when a failure happens on an overlay link,
the failure information is sent downstream to the destination overlay
node of the path. Then, the destination overlay node sets up an
alternate service-level path, with an alternate choice of service
instances (or at least a different path in the overlay graph). This
new path is given a version number of 200. If this fails as well, the
next version is 300, and so on.
The path recovery process is the same as path setup, except for the
version number.
Bhaskaran Raman, bhaskar@cs.berkeley.edu
Last modified: Tue Jan 22 15:33:43 PST 2002