CS 681: Performance Evaluation of Computer Systems and Networks
Course Syllabus (Updated Jan 2021)
Performance measures of computing and networking systems; open and closed loop load testing and performance measurement; performance modeling using Markovian queuing systems (M/M/c/k), asymptotic analysis of queues, basic queuing system laws: Utilization Law, Little’s Law; non-Markovian queues (M/G/1), open and closed queueing networks; design, implementation and statistical analysis of discrete event simulation models; application of data driven models for performance prediction: (machine learning models- e.g. support vector regression, clustering based models, Bayesian optimization, neural network models). Case studies and applications of these methods through paper-readings and projects in a spectrum of distributed and networked systems --- e.g., web systems, virtualization platforms, clouds, distributed data processing platforms and learning systems, distributed applications, networking protocols, network backbone architectures , etcDetailed topics, with changes from previous years shown:
- Introduction to resources of contention, parameters and metrics of performance.
- Introduction to queuing systems (open): basic elements of an open queueing system, Kendall notation, operational laws, low load and high load asymptotes of queue performance metrics, Little’s Law. Brief intuitive mention of Exponential distribution and memoryless property here, and Poisson distribution and PASTA.
- Closed queuing system: operational and low/high load asymptotic analysis of single server multiple users closed system, closed tandem queuing networks.
- Jacksonian closed queuing networks: Asymptotic analysis and MVA. Motivate that now for further analysis, formal probabilistc methods will be required
- Basic probability review, conditional probability, Law of total probability Random variables, expectation, variance, moments Common distributions - Special properties of Poisson, Exponential distributions (memorylessness)
- Discrete Event Simulation and its sound statistical analysis
- Functions of random variables: order statistics, Random Sums
- Stochastic Processes Continuous Time Markov Chains - only 'birth-death CTMCs' as required to derive formulae for queuing systems
- M/G/1 queue - derivations will be done without using DTMC background
- Data Driven Models: Paper readings of use AI/ML based models and methods (e.g. support vector regression, clustering based models, Bayesian optimization, neural network models) for various systems issues: predicting system performance, classifying workloads, conducting efficient load tests, efficient methods for collecting training data.
- For each topic, examples of application of the theory to systems and networking were continuously provided.
- From networking: Aloha, TCP, IEEE 802.11, Ethernet, TDMA/FDMA.
- From Systems: Multithreaded servers such as Web servers, multiple Server back-ends, etc, power-managed systems, virtualization, mobile phones, Big Data Processing platforms.
References
Textbooks
- Harchol-Balter, Mor. Performance Modeling andDesign of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013
- Queueing Systems vol I & II, by L. Kleinrock,John Wiley and Sons, 1975.
- Averill, M. "Law. Simulation modeling and analysis." Tata McGraw-Hill (2007).
Representative Papers
- Chiang, R. C., Hwang, J., Huang, H. H., & Wood, T. Matrix: Achieving predictable virtual machine performance in the clouds. In 11th International Conference on Autonomic Computing ({ICAC} 14) (pp. 45-56). 2014.
- Kundu, Sajib, Raju Rangaswami, Ajay Gulati, Ming Zhao, and Kaushik Dutta. "Modeling virtualized applications using machine learning techniques." In Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments, pp. 3-14. 2012.
- E. Vianna, G. Comarela, T. Pontes, J. Almeida, V. Almeida, K. Wilkinson, H. Kuno, and U. Dayal. Analytical performance models for MapReduce workloads. International Journal of Parallel Programming, 41(4):495{525, 2013.
- Alipourfard, O., Liu, H. H., Chen, J., Venkataraman, S., Yu, M., & Zhang, M. (2017). Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17) (pp. 469-482).
- Venkataraman, S., Yang, Z., Franklin, M., Recht, B., & Stoica, I. (2016). Ernest: Efficient performance prediction for large-scale advanced analytics. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16) (pp. 363-378).
- Peng, Y., Bao, Y., Chen, Y., Wu, C., & Guo, C. (2018, April). Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference (pp. 1-14).
- Mathew, A., Srinivasan, M., & Murthy, C. S. R. (2019, September). Packet Generation Schemes and Network Latency Implications in SDN-enabled 5G C-RANs: Queuing Model Based Analysis. In 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (pp. 1-7). IEEE.
- E. Balevi and R. D. Gitlin, "Unsupervised machine learning in 5G networks for low latency communications," 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC), San Diego, CA, 2017, pp. 1-2, doi: 10.1109/PCCC.2017.8280492.
- Jo, C., Cho, Y., & Egger, B. (2017, September). A machine learning approach to live migration modeling. In Proceedings of the 2017 Symposium on Cloud Computing (pp. 351-364).