int numOfRetries: The number of attempts to be made
to execute a RIB on a remote machine.
int retryTimeout: The process of retrying, to execute
the RIB on remote machine will continue until this time has expired.
int curLockId: An integer variable that holds the
id of the lock.
void setParam(int timeout, int retry, int localexec)
The function is called by the user program to
set the parameters such as the numOfRetries, retryTimeout and localExeOption.
These values are provided to the function as parameters.
void SetRibParam( int ribId)
This function takes as an argument, the id of
the RIB for which the parameters are to be set. The values of the parameters
set in the function setParam() are used to set the parameters for
this RIB.
int GetLocks( int num)
The function takes as argument,num, the number
of locks that the user program wants to acquire. The user library then
forwards this request to the coordinator. Its likely that requested number
of locks cannot be acquired. The function returns the actual number of
locks that have been acquired. Hence, the number returned can be smaller
than or equal to the value of num. The acquired locks are kept in the array,
lockList.
void SetLockFree( lockId)
The function sets the status of the lock, whose
id is supplied as an argument, to FREE. This lock can then be allocated
to other user programs that request the locks.
int GetCurLockId()
The function returns the id of the current lock,
i.e. the lock on which the operations are being performed.
void IncrCurLockId()
It increments the lock id to the next lock in
the list of locks that is FREE (not allocated). If no such lock exists,
curLockId
is set to -1.
MagicNumberArr * GetCurMagicNums( int num)
The function returns the array,MagicNumberArr*
arr, consisting of the magic numbers. The argument that the function
receives tells the number of locks for which the magic number are to be
returned.
The following steps are carried out:
1) Sequentially traverse
the list of locks contained in lockList.
2) For each lock, if
the lock is free, copy the magic number of the lock in the arr.
3) Stop traversing the
array when the magic number of num number of locks have been obtained.
4) If num number
of magic numbers are present in the array arr, return arr.
5) Calculate the magic
number for the local coordinator and store the value in magicNum.
6) Fill up the array
arr,
with (num - length of arr) magic numbers, the value of each being magicNum.
This means that we try to satisfy the request for remaining locks from
the local machine.
7) Return the array
arr.
IntArr* getRibId( int nodeId)
In order to find out which RIBs are executing
at a particular node, this function is invoked. The function receives as
argument the nodeId and returns the integer array of the id's of all the
RIB's executing on this node. IntArr* arr is the array of id's that
is returned.
For this, the list of locks, lockList, is
traversed and those locks, whose node-id matches the supplied nodeId and
whose status is allocated are picked up. The RIB-id is then taken from
these locks and stored in the array arr.
Finally arr is returned.
int PostedAlready( int nodeId, int funcType)
The user library provides this function to enable
reuse of the code already posted to a node. If the code has already been
sent to the node and the request is received to execute the code again
with different parameters, then instead of sending the code again only
the parameters can be sent.
The function receives as argument the nodeId
and the type of code, funcType. The function then checks if the code of
supplied type has already been sent to that node. Also checks if the code
is still executing on the remote node. If yes then return 1.
Else return 0 to indicate that code reuse is not possible.
void ArcStartUp()
This is the initialization routine for the user
program and the user program calls this routine to register itself with
the coordinator.
The function first obtains handle to the message
queue. Then it obtains the pid of the process that invoked this function
and calls the function RegisterProgram(pid) to register the pid of the
user program with the coordinator. Because of this registration, the coordinator
knows the pid of the user program and through this pid it can send messages
to it.
void ArcCleanUp() :
This function is called by the user program
when the execution of RIB is over and clean up is to be performed. This
function calls the function unregister(int pid) to deregister the
user program from the coordinator.
u_long PostRibServer( int ribId, int pid)
This function is for posting the RIBs for execution.
The arguments received by this function is the id of the RIB that is to
be posted and the pid of the process that wants to post this RIB.
The steps carried out in this function are following:
1) Initialize the data structure, RibInf
ribInf, that will contain the information related to the RIB.
2) If some lock has already been assigned to
the RIB, set the nodeId field of the corresponding RIB to the nodeId associated
with the lock. Else set nodeId = -1. This means that it has not yet been
decided where this RIB will be executed.
3) In ribInf, set the name of the function that
has to be executed, set the string containing some Directives about the
function, the pid of the user process.
4) If the RIB code has been assigned a nodeId,
i.e. the node on which it is to execute and it has not already been posted
to that node, then create a client connection on the local coordinator.
Invoke RPC function post_rib_inf_svc() on that coordinator. The
argument to the RPC function call will be the ribInf structure.
The value returned by the RPC call will be a id, progNum, which
can be used later to collect the result.
5) If the RIB code has been assigned a nodeId
and the code has already been posted to that node, then create a client
handle on the that node and call the RPC function process_rib_svc() on
that node.
6) If there is no node associated with a RIB
code, then create a connection with the local coordinator and invoke RPC
call process_rib_svc() on it.
7) The RPC function process_rib_svc() takes
the structure ribInf as an argument and returns an id, progNum.
8) Return progNum.
void Unlock()
The function is for releasing the locks when
the execution has terminated. This function creates a client handle with
the local coordinator and calls the RPC function unlock_svc(). The argument
supplied to the RPC function is the whole array of locks lockList,
that was obtained by invoking the function GetLocks(int num).
int messageHandler(void *pos, int size)
The coordinator communicates with the user programs
by placing messages for it in the message queue. This function accepts messages
from the message queues, interprets the type of the message from the type
field of the message and takes appropriate actions. The types of message
could be
* RESULT : indicating that the execution of
the RIB that was submitted earlier is complete and the results are available
to be collected.
* ERROR : Some error has occurred while executing
the RIB.
* NODE_FAILURE : The node on which the RIB was
executing has failed.
The function receives a argument the pointer
to a position where the results, if obtained, are to be placed and the
size of the result.
The function picks the message from the message
queue and determines the message type. If the type of the message is RESULT,
do the following:
1) Extract the result from the message.
2) Invoke the function SetLockFree()
to free the lock that was occupied by this RIB.
3) Copy the result to the memory area specified
by the pointer pos.
If the type of message is ERROR, terminate the process.
If the type of message is NODE_FAILURE, all the RIBs that were executing
on that node have to reexecuted on some other machine. For this, getRibId(nodeId)
function is called that returns an array consisting of all the RIB id's
what were executing on failed node, nodeId. For each of these RIB's,
if number of retries is less than the number allowed and they have not
exceeded the retryTimeOut, do the following :
1) Increase the retry count by 1.
2) Obtain a free lock.
3) If the free lock is obtained, set the status
of lock as ALLOCATED.
4) Post the RIB by invoking the function PostRibServer()
and obtain the id, progNum.
5) Also post the arguments to that RIB by invoking
the function PostArgs().
The function returns the id of the RIB for which it has received the RESULT. For other messages it returns -1.
void WaitOnSync(int ribId, void* pos, int size)
The purpose of this function is to wait for the
execution of tasks to get over. There is a while loop that checks for the arrival
of RIB results. This loop waits for the RIB results as long as the timeout
has not occurred and there are retries left for the RIB. After any of the
conditions becomes false, the status of the RIB is checked to find if the
results are available. If yes, then the results are copied in a structure
and the messagehandler(int *pos) function is called where pos
points to the location of the result. The function returns.
If the results are not available then the RIB is to be executed locally.
For this,the RIB is sent to the local server by invoking the function
PostRibServer(ribId,pid) . This function takes RIB-id and
process id and it sends the RIB to the local coordinator. Arguments to the
RIB are sent by calling the function PostArgs(ribId)
The function waits at this point till the results of execution are not
available. After the results have been obtained, the function copies the
result obtained, in a particular location and passes this location as an
argument to the function messagehandler().
MagicNumber* GetNewMagicNum(char* server)
The function takes as argument the name of the
coordinator and invokes the RPC call calculate_cur_magicnum_svc() on
the coordinator. The function returns a pointer to a structure that contains
the magic number for the supplied coordinator.