CS 386: Lab Assignment 3

(TAs in charge: Akshay Arora and Sagar Tikore)

The traveling salesperson problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in operations research and theoretical computer science. In this assignment you will implement hill climbing to solve TSP. We will consider only the Euclidean version of TSP, in which the cities all lie on a 2-dimensional plane. Each city will have (x,y) coordinates, and the euclidean distance between two cities will be the weight of the edge connecting them.

(image source: https://xkcd.com/399/.)

Code

Download this code base.

hill climb

You have to make changes in hillClimb.py. This file contains helper functions such as getRandomTour() and getTourLength() which will be useful while implementing TSP solutions. Running python3 hillClimb.py -h will list parameters, as shown here.

Parameters to `hillClimb.py`
--file	TSP instance file.
--cities	Number of cities.
--r1seed	Random seed to generate TSP instance file with number of cities provided in '--cities' option. (use --cities and --r1seed options only if '--file' option not used).
--r2seed	Random seed to generate random tour which is used for Task 1 and Task 2.
--start_city	Starting city for nearest neighbour algorithm (Task 3) and Minimum Spanning Tree-based initialisation (Task 4).
--submit	Option to generate graphs that you need to submit.

An input instance of a TSP problem (for example, see berlin52.tsp in the data folder) has one line corresponding to each city. This line specifies the index of the city, the x coordinate, and the y coordinate. In hillClimb.py , you will find the Node data structure. A node contains the index of a city, along with its x and y coordinates.

A tour is defined as a permutation of indices of nodes a₁, a₂, ..., a_n (where a_i is the index of node provided in the input file). Suppose b₁, b₂, ..., b_n is a tour: it means the salesperson starts the tour at b₁, then goes to b₂, and so on until finally returning to b₁ from b_n. So a tour is just a list of node indices.

For example, some tours in the instance shown below (left) are [1, 2, 3, 4, 5] (middle) and [2, 1, 3, 4, 5] (right). Any rotated version or mirror image of a tour is the same tour (e.g. 1,2,3,4,5 is the same as 2,3,4,5,1 which is the same as 3,4,5,1,2 and so on. Similarly 1,2,3,4,5 is same as 5,4,3,2,1).

In the code, the following variables, functions and data structures are implemented for your convenience.

Node: Class which has three member variables: index of city, x-coordinate and y-coordinate.
cities: denotes the number of cities in the graph.
nodeDict: This is a dictionary with its key as index of a node and the Node instance as its value.
getTourLength: This function takes a tour as an argument and returns the length of the tour.
getDistance: This function takes as input two nodes and returns the euclidean distance between them.
Tour: a tour is just a list of indices (not Nodes) of cities.

As presented in class, you will consider the 2-neighbourhood of nodes to perform hill climbing. You will also implement two non-trivial initialisation routines, which should help hill climbing find better solutions.

Not that although we apply "hill climbing" in this assignment, we only maintain tour lengths, which we aim to minimise. It would be unnatural to consider the negative of tour lengths and find a maximum. So we'll stick with minimising tour lengths. Consequently, although we call our method hill climbing, we'll be performing valley-descending.

Task 1: Generate 2-opt neighbours (2 marks)

Two tours are called 2-neighbours if one can be obtained from the other by deleting two edges and adding two edges. For example, the two tours shown below are 2-neighbours of each other.

For a given tour, generate a list of all its 2-neighbours (also sometimes called 2-opt neighbours). You will be implementing the function generate2optNeighbours(tour). Here tour is a list of indices of cities/nodes. When you fill out the function, it should return a list of tours. For example, generate2optNeigbours called on [1, 2, 3, 4] should return [ [2, 1, 3, 4], [3, 2, 1, 4], [1, 3, 2, 4]]. The order of the tours does not matter. In principle it is a good idea not to have rotated/mirrored copies of tours in the list returned, as they add unnecessary complexity to the algorithm. However, your code will still work if the list returned by generate2optNeighbours(tour) has such copies. For example, the above call might as well have returned ( [[4,1,3,2], [3,2,1,4], [2,1,3,4],[1,3,2,4]], where [4,1,3,2] is just the rotated version of [1,3,2,4]).

Hint: there is a simple trick to generate 2-neighbours quickly. See if reversing some substring of the tour list gives you something useful.

Use python3 hillClimb.py -h to get familiar with different command line options that you can use. After making changes to generate2optNeighbours(tour) in hillClimb.py, use the following command to test the function.

python3 hillClimb.py --file tsp4.tsp --task 1

This call should return [[4, 1, 3, 2], [1, 4, 3, 2], [3, 1, 4, 2], [4, 3, 1, 2],[1, 4, 3, 2], [4, 3, 1, 2]] or some variation involving rotations and mirror images. Also run tests with other instances, if you would like to make sure your code is correct.

Generating your own TSP instance

You can generate your own instance and use it for printing the 2-neighbours, using this command.

python3 hillClimb.py --cities 'number of cities' --r1seed 'random seed for generating city distances' --r2seed 'random seed for tour generation' --task 1

Note that the '--r1seed' random seed will be used to generate random coordinates for the given number of cities, and the '--r2seed' random seed will be used for creating a random initial tour. For example,

python3 hillClimb.py --cities 10 --r1seed 2 --r2seed 3 --task 1

will generate the 'tsp10' file for 10 cities with random coordinates in your current directory and will use it to generate 2-neighbours. If you are using an existing file, then there is no need to pass the '--cities' and '--r1seed' options. Just use this command.

python3 hillClimb.py --file 'filename' --r2seed 'seed value' --task 1

These command line options for generating new TSP instances or using existing ones are common to all tasks.

For submission, run this command.

python3 hillClimb.py --file data/tsp10.tsp --r2seed 1 --task 1

This command will generate 2optNeighbours.txt in your directory, which you need to submit for Task 1.

Task 2: Implement hill climbing [2 marks]

You will implementing the function hillClimbFull(initial_tour), which must return two things:

a list of tour lengths, one for each tour visited by hill climbing, until a local minimum is reached, and
the final tour found (that is, the local minimum).

For example, hillClimbFull([1, 2, 3, 4, 5]) on some instance might return something like [12.2, 11, 7.7, 5.6, 5.5, 3.8], [2, 3, 1, 5, 4].

As an intermediate step, you might find it useful to implement a hillClimbUtil(Tour) function that undertakes a single step of hill climbing: that is, takes as input a tour, generates all its 2-neighbours, and chooses the minimum length one. You will use the function that you have implemented in Task 1 to generate 2-neighbours. (In other tasks, too, feel free to modularise your code by creating your own functions.)

By running

python3 hillclimb.py --file data/berlin52.tsp --r2seed 1 --task 2

you can generate task2.png, which will be a graph of tour lengths against the number of hill climbing iterations. You can compare this with the berlin_task2.png already present in your directory to verify your graph is correct.

For submission, run python3 hillClimb.py --submit --task 2, which will generate task2_submit.png. This experiment runs your hill climbing code on st70.tsp, and uses 5 different random tour initialisations. The graph is plotted for each of the initialisations. This graph will be used for your evaluation.

The choice of initial tour before applying hill climbing can make a significant difference in the final tour found. In the next two tasks, we will investigate different approaches to pick a 'good' initial tour.

Task 3: Nearest Neighbour [2 marks]

Under this approach, given a starting node A, a tour is constructed in the following manner. You find the node closest to A, say B. Then you find the node closest to node B, say which is C. This process is continued until all the nodes are covered. The last edge of the tour is from the last node found to the starting node.

nearestNeighbourTour(start_city) returns the tour found out by applying this method. This function will be called within hillClimbWithNearestNeighbour, which will generate the graph just as it was done in Task 2.

Use command line option '--start_city' to give the start city. The command to test Task 3 is this.

python3 hillclimb.py --file data/berlin52.tsp --start_city 1 --task 3

This should generate task3.png. You can compare this graph with berlin_task3.png present in your directory to verify correctness. To prepare a graph for submission, run this command.

python3 hillClimb.py --submit --task 3

After executing this command you should find task3_submit.png in your directory, which shows the graph of 'tour length vs. hill climbng'. This experiment is based on the st70.tsp instance, and uses 5 different starting cities for the nearest neighbour algorithm. The difference from Task 2 is that the initial tour is chosen by the nearest neighbour construction, rather than at random. The graph is plotted for each of this initialisations.

Task 4: Euclidean Approximation [4 marks]

There's a well-known approximation algorithm to quickly find a route for Euclidean TSP that is guaranteed to be no more than twice as long as the optimal tour. This tour can be taken as an initial tour for hillclimb, and further improved. Here's the process to generate the initial tour.

Generate the complete graph for the nodes.
Construct a minimum spanning tree for the graph using Prim's or Kruskal's algorithm.
Do a preorder traversal starting from the initial node.

2 is already done for you; you are only supposed to solve 1 and 3. For 1, you will be implementing a part of the euclideanTour function. An empty list called edgeList is initialized for you at the start of euclideanTour. You have to add all the edges of the complete instance graph to edgeList. Each element in edgeList (that is, an edge) consists of a list [x,y,dist(x,y)], where x and y are indices of cities, and dist(x,y) is the euclidean distance between them. For instance, a partial edgeList might look like [ [1, 2, 2.4], [1, 3, 4.3] ]. For constructing the edgeList, you can use given global variables, such as are nodeDict and cities. You can also use the getDistance to get the distance between two cities.

Using your edgeList set, Kruskal's algorithm is applied to find a Minimum Spanning Tree. You can treat this portion as a blackbox for the purpose of this assignment. The Minimum Spanning Tree is stored as a list of edges in a list mst . An edge here is just [x,y] where x and y are two indices of cities. The list mst is to be used to generate a legal tour.

For 3, you will implement the finalOrder function. This function takes mst and start_city as input. The start_city is the parameter passed to the euclideanTour function. finalOrder should return the preorder traversal (a list of city indices) starting from start_city. This preorder traversal is your final Euclidean tour.

Your function hillclimbing function will be called in hillClimbWithEuclideanMST where the initial tour is the one obtained from euclideanTour. Use command line option '--start_city' to give the start city. The command to test Task 4 is as follows.

python3 hillclimb.py --file data/berlin52.tsp --start_city 1 --task 4

This should generate task4.png. You can compare this with the berlin_task4.png already present in your directory to verify your graph is coming correct.

To submit your graph, run:

python3 hillClimb.py --submit --task 4

After executing this command you should find task4_submit.png file which shows the graph of 'tour length vs. hill climbing iterations'. This graph runs your hill climbing code on st70.tsp. The difference from Task 3 is that initial tour is chosen by Euclidean approximation instead of random and is run on only one city.

How does the tour generated by the MST approach compare with those generated in Task 3 by the nearest neighbour approach? Is one always a better approach than the other? Run the two approaches on other instances provided in the data directory to come to a conclusion. Write down your observations on the comparison between the random, nearest neighbour, and MST approaches in a text file called observations.txt, which you will submit along with your code.

Function to visualise the tour

You can use the drawTour(nodeDict, tour) function to visualise the final tours you get after applying different algorithms. You just have to change the second parameter of the function, and specify the tour you want it to draw.

Submission

You are not done yet! Place following files in a directory named your roll number (say 12345678). Tar and Gzip the directory to produce a single compressed file (12345678.tar.gz). It must contain the following files.

hillClimb.py
2OptNeighbours.txt
task2_submit.png
task3_submit.png
task4_submit.png
observations.txt

Submit this compressed file on Moodle, under Lab Assignment 3.

Your submission will be evaluated based these files, but we will also examine and run your code to verify correctness.