CS 215 - Data Interpretation and Analysis

Instructor: Ajit Rajwade and Suyash Awate
Office: SIA-218, KReSIT Building
Email:

Lecture Venue:LH-101
Lecture Timings: Slot 10, Tuesday and Friday 2:00 to 3:25 pm

Instructor Office Hours (in room SIA-218): Tuesday and Friday 6:30 pm to 7:30 pm, or after class, or by appointment via email (also feel free to send queries over email)

Teaching Assistants: Vitobha M, Abhishek Chakraborty, Amal Dani, Divyanshu Grover, Ravi Mishra, Rajeev Kumar
TA office hours: (in KReSIT library)
  • Vitobha: Mon, 5:30 to 6:30 pm
  • Amal: Thurs, 5:30 to 6:30 pm
  • Divyanshu: Mon, 5:30 to 6:30 pm
  • Ravi: Tue, 5:30 to 6:30 pm
  • Rajeev: Wed, 6:30 to 7:30 pm
  • Abhishek: Wed, 11:00 am to 12:00 pm




Topics to be covered (tentative list)


Intended Audience

2nd year BTech students from CSE

Learning Materials and Textbooks

Computational Resources


Grading Policy (tenative)


Other Policies


Tutorials

Homework Solutions

HW1, HW2

Quizzes

Quizzes

Lecture Schedule:


Date

Content of the Lecture

Assignments/Readings/Notes

21/07 (Tue)
  • Introduction, course overview and course policies
  • Descriptive statistics: key terminology
  • Methods to represent data: frequency tables, bar/line graphs, frequency polygon, pie-chart
  • Concept of frequency and relative frequency
  • Cumulative frequency plots
  • Interesting examples of histograms of intensity values in an image
24/07 (Fri)
  • Mean, median and their properties including behavior under outliers
  • Quantiles
  • Variance and standard deviation: applications
  • Chebyshev's inequality: proof of two-sided version; examples
28/07 (Tue)
  • Chebyshev's inequality: proof of one-sided version; another variant of one-sided version
  • Correlation coefficient: definition, geometric meaning, properties, positive correlation, negative correlation, lack of correlation, examples including one from image processing; uncenterd correlation coefficient and its problems; correlation versus causation
  • Proof that the median minimizes the total absolute deviation - one using calculus (which has problems!) and one without using calculus (taken from a 1-page paper published in a journal called The American Statistician).
  • Slides: Descriptive statistics
  • Readings: section 2.3, 2.4, 2.6 from the textbook by Sheldon Ross
  • Optional Suggested Exercises
    1. Proof of the variant of one-sided Chebyshev's inequality
    2. Verify all the properties of the correlation coefficient
    3. In the proof of the one-sided Chebyshev's inequality, verify that b = s/k minimizes the term on the RHS in the proof
31/07 (Fri)
  • Extensive MATLAB tutorial: code vectorization, various operations on vectors and matrices, plots of different types, some differences with C, solving simultaneous linear equations
  • See list of tutorials on this page under "Computational Resources"
  • The MATLAB examples we did in class are here (download each file and open it in the MATLAB editor - the identation is not visible if you open it in the browser).
03/08 (Tue)
  • Concept of discrete probability (frequentist view), sample space, event
  • Composition of events: union, intersection, complement; Basic set theory
  • Concept of conditional probability and joint probability and the difference between the two
  • Concept of mutually exclusive and independent events
  • The false positive paradox
06/08 (Fri)
  • Birthday paradox (under discrete probability)
  • Random variables: definition and examples, discrete and continuous
  • Probability mass function (PMF), cumulative distribution function (CDF) and probability density function (PDF)
  • Expected value and its properties, in particular: linearity
  • Mean and median: minimizers of average squared error and average absolute error
  • variance and its properties
  • Markov's inequality and its proof; Chebyshev's inequality and its proof using Markov's inequality
11/08 (Tue)
  • Weak law of large numbers: proof using Chebyshev's inequality
  • Strong law of large numbers: only statement
  • Joint PDF, PMF, CDF; marginal PDF/PMF/CDF
  • Independence of random variables
  • Concept of covariance and its properties
  • Concept of conditional PDF, conditional expectation and conditional variance
14/08 (Fri)
  • Bernoulli and Binomial distributions: definition, mean, mode, variance and other properties
  • Examples using the Bernoulli and Binomial distributions
  • Poisson distribution, Poisson Limit theorem, properties of Poisson distribution
18/08 (Tue)
  • Poisson distribution, Poisson Limit theorem, properties of Poisson distribution: mean, variance, mode
  • Examples of Poisson distribution; Image shot noise as an example of a Poisson random variable (not on exam)
  • Gaussian distribution: mean, variance, median, mode
  • Central limit theorem: demonstration, applications, variant using independent but not identically distributed variables
  • Central limit theorem versus weak law of large numbers
21/08 (Fri)
  • More discussion about the central limit theorem
  • (Bounded) Uniform random variables: application to sampling from discrete distributions and generation of random k-subsets of a set of size n > k
  • Exponential distribution: motivation and properties: mean, variance, mode, median
25/08 (Tue)
  • Exponential distribution: motivation and properties: memoryless property and minimum of exponential random variables
  • More discussion about the central limit theorem
  • Distribution of sample mean and sample variance, Bessel's correction in sample variance
  • Chi-square distribution (not in detail)
  • Multinomial distribution: mean and covariance matrix
28/08 (Fri)
  • Quiz1
01/09 (Tue)
  • Concept of parameter estimation (also called parametric density estimation), concept of maximum likelihood estimation (MLE)
  • MLE for parameters of Bernoulli, Gaussian, Poisson, Uniform distributions
  • Concept of biased and unbiased estimators
  • MLE for linear regression: estimating slope and intercept of a line approximating a set of points with accurate x coordinates but noisy y coordinates (Gaussian noise case)
  • Slides
  • Sections 7.1 and 7.2 of Sheldon and Ross, Section 9.2 for the regression problem
04/09 (Fri)
  • Estimator bias, variance and mean squared error; relationship between bias, variance and mean-squared error
  • Example of different estimators and a comparison of their bias, variance and mean squared error (for the case of uniform distribution)
  • Confidence intervals: for mean of a Gaussian with known variance, for variance of a Gaussian, for mean of a Bernoulli random variable
  • Slides
  • Sections 7.1, 7.2, 7.3 (skip subsection 7.3.1), 7.5, 7.7 of Sheldon and Ross, Section 9.2 for the regression problem