Course project: Understanding network performance in dense WiFi settings
Guidelines for final report submission and viva (Updated April 7, 2014)
Please note that your timely progress at the monthly reviews has so far accounted for 10% of your course grade. Your final project report and viva will account for 10% each (adding up to a total of 30% for the project).
Here is a summary of the project so far. A trace of 12 WiFi clients connecting to a webserver and downloading a large video file was provided to you. Most of you have analyzed the file correctly, parsed the HTTP requests, and computed that the average time to download the file was around 150 seconds. The task was to then recreate this scenario in simulation using ns-3. Here are the steps you should have followed in simulation:
- Setup 12 clients close to the AP, such that each client is capable of transmitting at the 54 Mbps raw physical layer rate (as this was the case in the real experiment). In this baseline configuration, the TCP-level throughput is about 24 Mbps (after accounting for MAC overheads), which translates to around 2 Mbps per client. This throughput leads to a download time of around 40 seconds when downloading the 10 MB video file. All teams are required to reach this baseline first, to make sure all your simulation parameters are set correctly.
- As most of you have rightly observed, the difference in download times (between 40s in the baseline and 150s in the real experiment) is due to the added uplink traffic from the clients (low rate chatter in the form of TCP SYNs, HTTP GET requests etc.), leading to increased contention on the channel, and increased collisions. The collision losses further cause the rate adaptation algorithm (SampleRate in real life, AARF or any other algorithm in simulation) to lower the bit rates, causing a further increase in contention on the channel. All of this leads to TCP losses, and increase in download time to what was being observed in the trace.
Most teams have reached this point in the project, as seen in the third project review in the first week of April. Going forward, each team will be submitting one joint report, under 8 pages (including any figures and graphs). Please submit one hard copy of your report in my office. Your report should have the following components.
- Names of your team members, and the individual contributions of the team members. Please provide a 2-3 sentence description of the specific work done by each member of the team.
- A short description of the trace analysis, mentioning any interesting points you found.
- A descrpiton of your simulation scenario and setup, and confirm that you have started with the baseline mentioned above. Please mention any relevant configuration changes you have made (e.g., what rate adaptation algorithm did you pick).
- A list of changes you have done to the simulation to go from the baseline to matching the download time seen in the real experiment. Please describe in detail the traffic models you have used (i.e., what is the download and upload traffic at each client). Please be clear on the exact parameters used. For example, if you have added On-Off traffic helper to generate uplink traffic at the clients (as an example), then you must be clear on what is the rate of traffic, what is the total amount of traffic sent in uplink, how long does the uplink transfer last etc. Please mention all your simulation parameters very clearly.
- The final download time you have obtained in simulation. You may or may not match the final number of the trace. If you did match, please explain why the match happened. If you could not match the download time in the trace, please explain why you did not match. There is no one right answer here, but clearly justify whichever answer you choose to provide.
- If you have looked at, analyzed, and tried to match (between simulation and traces) any other parameters (e.g., TCP retransmission rates, time-sequence graphs, RTT, cwnd) please mention these in the report as well. If you have instrumented your simulator to print out any interesting statistics (e.g., link layer collision rates), please mention them. Different teams will have different answers here, depending on how deep you have looked into the traces.
- Once you describe the simulation for the case of 12 clients, re-run your simulation with varying number of clients (say, between 5 and 50), and observe how the average download time changes. That is, provide a graph of number of clients on x-axis vs. average download time on y-axis, while keeping the traffic model and everything else similar to the 12-client case that you analyzed. This graph should help you answer questions such as: what is the maximum number of WiFi clients per AP such that the download of a 10 MB file completes in under 5 minutes?
- Optionally, you can suggest and/or implement any strategies to reduce the contention and download time in high user-density WiFi scenarios. Please suggest solutions that are sound and practical to work in real life. Even if you don't have the time to implement them in simulation, it would good to conduct the thought experiment, and note down your conclusions.
As you can see, there are a lot of things to fit in within 8 pages. So please be clear and concise when you write your report.
After submitting the report, each team should attend the viva by signing up for a 15-minute slot. Please bring your laptops with the simulation and trace analysis code (please charge your laptops, and have them ready to use, so you don't have to plug them in during the 15-minute slot). You need not make any slides (any content you want to present should already be in your report, which I will have with me during the viva). The goal of the viva is to assess the contributions of the individual team members to the project. Some of the things you will be tested for in the viva:
- You may be asked to run your simulation code to print out the download times. Please keep two versions of the simulation script: one baseline version (without any uplink traffic, where the download time is around 40s), and the final version (with a higher download time). Please have the scripts ready to run and print out the download times.
- You may be asked to show and explain specific parts of the code (e.g., which part of your trace analysis identifies the download time, which part of your simulation script sets the bit rate adaptation algorithm). You will be asked these questions individually, and only in the parts of the work that you claim to have done in your report.
- You may be asked some conceptual questions about your understanding of various aspects of the project (e.g., why does download time increase when uplink traffic is added?).
Description of the project (to be read at the start of the project)
The goal of this project is to get a first hand experience in doing
simple wireless networking research. We will consider the scenario of
a number of clients connected over WiFi to a web server via an access
point (AP), and downloading content from the server. Those of you who
have taken CS 699 (Software Lab) last semester will remember
downloading quizzes and lectures this way during the course. We will
analyze and understand wireless network performance in such a
scenario, using a combination of simulations and analysis of real
traces. Our particular emphasis will be on understanding how and why
WiFi performance degrades when there are a large number of users.
Here is a link to the trace used for the project: 12clients.pcap.
Logistics
You will do the project in teams of 3 students (2 students is also
fine if you so desire). Please decide on your teams and send me email
within the first two weeks of the semester.
Components of the project. The project will consist of analyzing networking traces, running
simulations, and putting together real network data and simulations to
understand the performance of wireless networks. The project will roughly have three parts.
- We will provide you with a network trace of a real experiment of multiple WiFi clients downloading content from a server via an access point. You will analyze the trace to understand network performance seen in downloads in such settings.
- You will use a network simulator to simulate a WiFi network (a certain number of clients connected to an HTTP server via an AP) and recreate the trace experiment in simulation. You will run various simulations to understand WiFi performance at various layers (link layer, TCP, HTTP etc.).
- The network performance seen in real life and in simulations will likely be very different. You will then proceed to understand the differences between simulations and real life, and enhance the simulation models to better reflect real life.
- Optional: If you have completed the above steps, you can also suggest improvements to network protocols at various layers to improve the network performance seen in simulations. You can implement and test these enhancements in simulation.
Each of these parts is explained in more detail later on.
Monthly reviews. We will hold three monthly reviews to monitor
the progress of the project (please see the class homepage for
dates). You can divide the work between the reviews as you see
fit. For example, you can do the simulations first, and trace analysis
next, or you can start simulations and trace analysis both at once. I
leave it to the individual teams to plan their activities. Your
deliverable for each monthly review is a short (under 4 pages) report
by each team. During the review, I will meet with each team
individually. We will go over your report and you must explain your
results. You will also be asked to explain what each member of the
team has done. Appropriate feedback will be provided to ensure that
the project is going on the right track.
Final report, viva. Each team will submit one joint final
report (under 8 pages) at the end of the semester. We will also have a
viva, where each member of the team has to answer questions
individually, and must be able to explain the work they did.
Goals of the project. Ideally, you should have learnt the following things by the end of the project.
- Learning the basic building blocks of working in the field of wireless networks, like using network simulators, and analyzing network traces using tools and scripts.
- Learning how to understand and analyze complex systems. The situation we are dealing with (dense WiFi setting) has many complex interactions between various protocols at various layers of the networking stack. Understanding what is happening at each layer, and being able to clearly explain the end-result requires careful and clear thinking. Developing the skill of making sense of data from a complex system will come in useful in whatever job you do in the future. Please spend enough time looking at your data, graphs, and results to convince yourself and me that you understand everything. Just generating a bunch of graphs and throwing them into a report is not enough.
- Learning how to work in teams. You will need to decide how to split the project into manageable chunks, how to distribute work, and how to coordinate amongst your team members.
- Learning how to present your results. You must be able to clearly articulate your understanding and analysis in the reports and presentations as part of the course.
Grading. Your project grade (30% of the final grade) will be decided as follows.
- 10% will be based on your timely progress during monthly reviews.
- 10% will be based on your final report.
- 10% will be based on your individual viva at the end of the project.
During each of these stages (monthly reviews, final
report, and viva), you will be judged on how well you
have met the four goals listed above. That is, in addition to just
doing the actual work, you will also be required to demonstrate other
meta-skills like clarity of thought, planning in a team, and good
communication.
Trace analysis
Experimental scenario. You are provided with a trace (a pcap
file) that was captured during the following experiment (see the top
of the page for a link to the trace). 12 users (WiFi clients) connect
to a WiFi access point (AP). The AP is bridged via ethernet to an
Apache web server that hosts some content. The users first go to a URL
on the web server and authenticate themselves (by typing in a username
and password). After authentication, the users are shown a directory
listing of all the files on the web server. The users then proceed to
download a video file ("raw.mp4", approx 9 MB) from the server.
The clients are all using 802.11g, on a clean channel without
external interference. A bit rate adaptation scheme at the client and
AP was adjusting the transmit bit rate between the minimum and maximum
values of 1 Mbps and 54 Mbps. That is, the maximum raw throughput to
be expected on the wireless link was 54 Mbps. All clients were close
enough to the AP that they could send and receive at the 54 Mbps bit
rate when operating alone. The client download was bottlenecked only
by the wireless network, i.e., the wired network was
congestion-free. You may also assume that the wired part of the
network (between the AP and server) was essentially loss-free.
The trace is collected by running tcpdump on the web server. The trace
collection begins after all clients have associated with the AP but
before any download begins. The trace ends when all clients have
finished downloading the file. (Note that the behaviour of the clients
is only the recommended behavior, some users may have slightly
deviated from what they were asked to do.)
Tools. You can use tools such
as wireshark (which has a
graphical interface)
or tshark
(a commandline interface) to analyze the traces. Wireshark gives you
an easy way to visualize the data. Tshark gives you a way to extract
text from the trace. You can then proceed to write scripts in your
preferred scripting language (shell/perl/python) to extract various
numbers from the traces. Please spend some time familiarising yourself
with the script before you jump in and generate graphs.
Steps in the analysis. Below is a rough guideline on how you
must proceed with the project, what graphs you can generate, and what
to look for in the analysis. Note that you have considerable freedom in
shaping your project, so please feel free to modify this plan as you
see fit.
- Identify all the clients in the trace by their IP
addresses. Note the HTTP requests and TCP connections between the
client and web server. How many transactions (TCP/HTTP) took place between
each client and the server? How long did each of these transactions last?
How much traffic was generated in the uplink (from client to AP) and
downlink (from AP to client) directions? Convince yourself that the
trace indeed matches the experiment description given above.
- How long did the download of the video file take for each client?
How can you explain the difference in behavior between the client that
took the shortest time and the client that took the longest time? You
can start by looking at the TCP behaviour of these clients. Wireshark
gives you a way of looking at how the TCP connection progressed
(google for "wireshark TCP time sequence graphs"). TCP throughput will
be key in understanding how long each client's download took to
complete.
- What was the average downlink throughput seen by each client
during the experiment? You can compute the downlink throughput as the
total bytes (at the HTTP or TCP layer) downloaded divided by the total
time taken. What the aggregate downlink throughput of the AP across
all clients?
- How do the per-client and aggregate throughput numbers computed
above vary with time? That is, try to compute throughput over smaller
windows of time, instead of over the entire experiment. Was there
unfairness across clients? Was the system throughput roughly constant with time,
or did it vary a lot? If it did vary, why did it vary?
- Do the throughput numbers you see make sense, given the expected
bit rates of the clients? Do you think the throughput should have been
higher/lower than what you see? If yes, try to understand the reason
for the difference.
- What were the loss rates seen by the TCP connections? TCP
retransmissions (both fast retransmits and timeouts) are marked in the
trace, and these retransmissions will give you an idea of how
prevalent losses are.
The bottomline of your trace analysis will be to understand and
explain the network behavior at each layer of the networking stack
while a relatively large number of clients were downloading a big
file over WiFi.
Simulation
Tools. While there are many wireless network simulators out
there, I would strongly suggest you
use ns-3 to run your simulations,
because it has the best models available for WiFi amongst the
simulators I am familiar with. You can also go with ns-2 (or any other
simulator) if you are more comfortable with it, but please do your
research to make sure the simulator provides all the models required
to do what is needed for the project. Please take some time to
familiarize yourself with the simulator and its documentation before
you start.
Simulation scenario. Your simulations will try to recreate
the trace collection experiment described above as far as possible. A
certain N number of clients (N can vary across simulations) connect to
a WiFi AP, which is in turn bridged to a node that is running an HTTP
web server. The WiFi clients also run HTTP client modules and issue
HTTP requests. All clients download a large video file from the web
server. You can decide the exact details of link bandwidths, delays,
traffic model choices, and other simulation parameters as you see
fit. I am not specifying the simulation config in great detail on
purpose: part of the puzzle is also to figure out how to model a real
life system in simulations. Whenever in doubt about the simulation
parameters, please use the trace as your guide.
Note that ns-3 has a choice of various bit rate selection
algorithms at the MAC layer. You can choose any of them, or you can
use fixed rate. Again, the choice is dictated by what you see in your
traces.
ns-3 provides a way for you to collect pcap traces at various
nodes, in addition to various levels of debug logging. You can use
some combination of logging and trace collection to understand the
network behavior.
Steps in the analysis. Here is a rough guideline on how you should proceed with the simulations.
- First, try to simulate the exact experiment that is captured in the trace file provided. Your analysis can proceed in a similar fashion to what you did with the trace file. The results to look at are similar: time taken for each client to complete the download, per-client throughput, aggregate throughput of the AP, TCP loss rates, and so on.
- Vary the number of clients in the experiment, and see how the completion times and throughput numbers vary.
- Experiment with various bit rate adaptation algorithms? Is fixed rate giving you better results, or is some particular rate adaptation algorithm giving you better results.
Reconciling simulation and real-life experiments
Compare and contrast the results from the trace to results from simulations. Do your simulation and trace analysis results above match? That is, when you simulate a setting identical to the experiment from which the trace was collected, do the results like download completion time and aggregate network throughput match within the simulations and analysis. If they do, well, I would suggest you make sure that there is nothing wrong with your results. If you have double checked, and they still match, well, great!
In all likelihood though, simulations and experiments produce different results in practice. The first step would be to identify why the numbers are different. Here are some questions to ponder upon. These are only a small sample. I am sure we can identify more issues when we discuss your results in the monthly reviews.
- Is the traffic pattern the same? For example, how much traffic is each node sending in the uplink and downlink directions?
- What are the loss rates at the TCP layers in each case? Why are the loss rates different?
- How many concurrent TCP/HTTP connections were present in each case? In other words, what was the contention on the wireless channel like in each case?
Once, you have identified the source of the discrepancy, you can modify the simulation setup to reflect reality in a better fashion. For example, you can change the traffic model, or the wireless channel loss model, or something else, to better capture the real-life effects in simulation. At this stage, the project is basically open-ended. There is no one right answer, as long as you can convince me of your answer.
Improving network performance in dense WiFi settings
This is the most challenging part of the project. I suggest you
attempt this only if you have completed the above work
thoroughly. That is, you have fully analyzed the trace file provided. You
have recreated the experiment in simulation, and you have reproduced
the network performance seen in real-life in your simulations as
well. Now, you can take the next bold step of trying to fix this
problem (in simulations of course!).
Once you have understood the performance bottleneck that is causing
a degradation in performance in a dense WiFi deployment, you can come
up with a fix to the bottleneck. The exact fix will depend on what
you identify as the problematic part of the network. Again, there is
no one right answer, and you can fix this problem in many
ways. Whatever fix you come up with, you will be required to implement
this fix, and demonstrate the improvement in performance before and
after your fix. For example, you can show that downloads are finishing
faster, or that network throughput has improved, or losses are lower,
or whatever. Please talk to me about your plans during monthly
reviews, and get feedback on your ideas.
Finally...
Good luck! Hope you have fun and/or learn something useful during the project!