The User Library



Data Structures:

int numOfRetries: The number of attempts to be made to execute a RIB on a remote machine.
int retryTimeout: The process of retrying, to execute the RIB on remote machine will continue until this time has expired.
int curLockId: An integer variable that holds the id of the lock.



Various functions provided to user programs are listed below:

void setParam(int timeout, int retry, int localexec)
    The function is called by the user program to set the parameters such as the numOfRetries, retryTimeout and localExeOption. These values are provided to the function as parameters.

void SetRibParam( int ribId)
    This function takes as an argument, the id of the RIB for which the parameters are to be set. The values of the parameters set in the function setParam() are used to set the parameters for this RIB.

int GetLocks( int num)
    The function takes as argument,num, the number of locks that the user program wants to acquire. The user library then forwards this request to the coordinator. Its likely that requested number of locks cannot be acquired. The function returns the actual number of locks that have been acquired. Hence, the number returned can be smaller than or equal to the value of num. The acquired locks are kept in the array, lockList.

void SetLockFree( lockId)
    The function sets the status of the lock, whose id is supplied as an argument, to FREE. This lock can then be allocated to other user programs that request the locks.

int GetCurLockId()
    The function returns the id of the current lock, i.e. the lock on which the operations are being performed.

void IncrCurLockId()
    It increments the lock id to the next lock in the list of locks that is FREE (not allocated). If no such lock exists, curLockId is set to -1.

MagicNumberArr * GetCurMagicNums( int num)
    The function returns the array,MagicNumberArr* arr, consisting of the magic numbers. The argument that the function receives tells the number of locks for which the magic number are to be returned.
    The following steps are carried out:
        1) Sequentially traverse the list of locks contained in lockList.
        2) For each lock, if the lock is free, copy the magic number of the lock in the arr.
        3) Stop traversing the array when the magic number of num number of locks have been obtained.
        4) If num number of magic numbers are present in the array arr, return arr.
        5) Calculate the magic number for the local coordinator and store the value in magicNum.
        6) Fill up the array arr, with (num - length of arr) magic numbers, the value of each being magicNum. This means that we try to satisfy the request for remaining locks from the local machine.
        7) Return the array arr.

IntArr* getRibId( int nodeId)
    In order to find out which RIBs are executing at a particular node, this function is invoked. The function receives as argument the nodeId and returns the integer array of the id's of all the RIB's executing on this node. IntArr* arr is the array of id's that is returned.
    For this, the list of locks, lockList, is traversed and those locks, whose node-id matches the supplied nodeId and whose status is allocated are picked up. The RIB-id is then taken from these locks and stored in the array arr.
Finally arr is returned.

int PostedAlready( int nodeId, int funcType)
    The user library provides this function to enable reuse of the code already posted to a node. If the code has already been sent to the node and the request is received to execute the code again with different parameters, then instead of sending the code again only the parameters can be sent.
    The function receives as argument the nodeId and the type of code, funcType. The function then checks if the code of supplied type has already been sent to that node. Also checks if the code is still executing on the remote node. If yes then return 1.
Else return 0 to indicate that code reuse is not possible.

void ArcStartUp()
    This is the initialization routine for the user program and the user program calls this routine to register itself with the coordinator.
    The function first obtains handle to the message queue. Then it obtains the pid of the process that invoked this function and calls the function RegisterProgram(pid) to register the pid of the user program with the coordinator. Because of this registration, the coordinator knows the pid of the user program and through this pid it can send messages to it.

void ArcCleanUp() :
    This function is called by the user program when the execution of RIB is over and clean up is to be performed. This function calls the function unregister(int pid) to deregister the user program from the coordinator.

u_long PostRibServer( int ribId, int pid)
    This function is for posting the RIBs for execution. The arguments received by this function is the id of the RIB that is to be posted and the pid of the process that wants to post this RIB.
    The steps carried out in this function are following:
    1) Initialize the data structure, RibInf ribInf, that will contain the information related to the RIB.
    2) If some lock has already been assigned to the RIB, set the nodeId field of the corresponding RIB to the nodeId associated with the lock. Else set nodeId = -1. This means that it has not yet been decided where this RIB will be executed.
    3) In ribInf, set the name of the function that has to be executed, set the string containing some Directives about the function, the pid of the user process.
    4) If the RIB code has been assigned a nodeId, i.e. the node on which it is to execute and it has not already been posted to that node, then create a client connection on the local coordinator. Invoke RPC function post_rib_inf_svc() on that coordinator. The argument to the RPC function call will be the ribInf structure. The value returned by the RPC call will be a id, progNum, which can be used later to collect the result.
    5) If the RIB code has been assigned a nodeId and the code has already been posted to that node, then create a client handle on the that node and call the RPC function process_rib_svc() on that node.
    6) If there is no node associated with a RIB code, then create a connection with the local coordinator and invoke RPC call process_rib_svc() on it.
    7) The RPC function process_rib_svc() takes the structure ribInf as an argument and returns an id, progNum.
    8) Return progNum.

void Unlock()
    The function is for releasing the locks when the execution has terminated. This function creates a client handle with the local coordinator and calls the RPC function unlock_svc(). The argument supplied to the RPC function is the whole array of locks lockList, that was obtained by invoking the function GetLocks(int num).

int messageHandler(void *pos, int size)
    The coordinator communicates with the user programs by placing messages for it in the message queue. This function accepts messages from the message queues, interprets the type of the message from the type field of the message and takes appropriate actions. The types of message could be
    * RESULT : indicating that the execution of the RIB that was submitted earlier is complete and the results are available to be collected.
    * ERROR : Some error has occurred while executing the RIB.
    * NODE_FAILURE : The node on which the RIB was executing has failed.

    The function receives a argument the pointer to a position where the results, if obtained, are to be placed and the size of the result.
    The function picks the message from the message queue and determines the message type. If the type of the message is RESULT, do the following:
    1) Extract the result from the message.
    2) Invoke the function SetLockFree() to free the lock that was occupied by this RIB.
    3) Copy the result to the memory area specified by the pointer pos.

If the type of message is ERROR, terminate the process.

If the type of message is NODE_FAILURE, all the RIBs that were executing on that node have to reexecuted on some other machine. For this, getRibId(nodeId) function is called that returns an array consisting of all the RIB id's what were executing on failed node, nodeId. For each of these RIB's, if number of retries is less than the number allowed and they have not exceeded the retryTimeOut, do the following :
    1) Increase the retry count by 1.
    2) Obtain a free lock.
    3) If the free lock is obtained, set the status of lock as ALLOCATED.
    4) Post the RIB by invoking the function PostRibServer() and obtain the id, progNum.
    5) Also post the arguments to that RIB by invoking the function PostArgs().

The function returns the id of the RIB for which it has received the RESULT. For other messages it returns -1.

void WaitOnSync(int ribId, void* pos, int size)
    The purpose of this function is to wait for the execution of tasks to get over. There is a while loop that checks for the arrival of RIB results. This loop waits for the RIB results as long as the timeout has not occurred and there are retries left for the RIB. After any of the conditions becomes false, the status of the RIB is checked to find if the results are available. If yes, then the results are copied in a structure and the messagehandler(int *pos) function is called where pos points to the location of the result. The function returns.
If the results are not available then the RIB is to be executed locally. For this,the RIB is sent to the local server by invoking the function PostRibServer(ribId,pid) . This function takes RIB-id and process id and it sends the RIB to the local coordinator. Arguments to the RIB are sent by calling the function PostArgs(ribId) The function waits at this point till the results of execution are not available. After the results have been obtained, the function copies the result obtained, in a particular location and passes this location as an argument to the function messagehandler().

 
MagicNumber* GetNewMagicNum(char* server)
    The function takes as argument the name of the coordinator and invokes the RPC call calculate_cur_magicnum_svc() on the coordinator. The function returns a pointer to a structure that contains the magic number for the supplied coordinator.