The final project of our choice for computer networks 2 is a Distributed Efficient Searching Model. A normal search in a database takes the entry description from the client and iterates through all the entries in the database, which takes O(M) time, considering there are M files in the database. Our idea is to segregate the M files to multiple workers, say N different workers who can look through their share of the database simultaneously, reducing the searching time to O(M/N).
The structure of the model is as shown in the figure. The client is directly interacting and transfering data with the server while the server is in connection with all the workers(3 in this illustration). And each worker is handling their own share of the data. We are also planning to have a backup server just in case the primary server fails.
Coming to the system design, To access the Database, A client has to login using their username and password. After that, they can request different things e.g. file data, file details or a list of files. In case the client asks for files list, a list of all the files in the database is sent back to the client. To get file data or file information, on request, the server calls the send_file function which connects the server to the corresponding worker’s port number and IP address.
In case we want to add or remove workers from our system, we can do by corresponding worker handling requests. Add worker function adds a new worker with the given IP and port number, rem_worker removes the worker with the given IP address. The other part of the requests are about the files stored in the database. How it handles sending file, file information and file list is something we have already gone over.
The worker on getting a send file request or a send file information request, calls the send_port_number() function and then calls the send_data() function with the file name as it’s argument. This function looks through the database and returns the desired files. Then the worker establishes a TCP connection to send the data which was asked for.
The client initially checks if the server is working or not, followed by sending username and password to login to the server. Then the client is given a list of actions available form which they choose one and request it. Then they have to listen to get a port number to establish a TCP connection to recieve the required data. The recv_file function has 2 arguments, file name and request_type which is used to denote if we’re asking for file or file information. We have a different function receive_files_list() to get the files list from the server.
The pseudocode of the server is pretty self-explanatory. If has to listen to any requests coming from the client, it has to take the username and password given by the client and validate them, also letting the client know the result of validation. If the request is related to workers e.g. add or remove worker, then the server can directly execute those functions.
The major functions behind the workings of the server are as follows. A fuction to get a list of all files which requests all the workers to return a list of all the files under it, merges all the returned lists and sends it back to the client. The function file_data gets either file data or file info depending on the value of request_type being 0 or 1. It basically requests each worker to search for the file, recieves the port number from whichever worker finds the file and sends the port number to the client. The workings of the manage_workers file has been discussed earlier.
The worker mainly waits for requests from the server, searches for the required file and upon finding it, returns it’s port number to the server which is sent to the client by the server. The worker establishes a TCP connection with the client and based on the value of the msg.type, sends the file data or file info to the client. if the msg.type is 2 though, the worker just sends back the list of files under it.
Something we are planning to add to this model is a caching system. At present we are thinking of storing recently requested files in the server. We will try to improve the caching algorithm farther down the line.