CS 215 - Data Interpretation and Analysis

Instructor: Ajit Rajwade and Suyash Awate
Office: SIA-218, KReSIT Building
Email:

Lecture Venue: CC 103 (New CSE Building, 1st Floor)
Lecture Timings: Slot 6, Wednesday and Friday 11:05 to 12:30 pm

Instructor Office Hours (at the Lecture Venue): Wednesday and Friday, 12:30 to 1:00 pm, i.e. immediately after class, or by appointment via email (also feel free to send queries over email or moodle)

Teaching Assistants: Bharat Khandelwal, Rohit Jena, Devansh Shah, Sheshansh Agrawal, Ananya Bahadur, Polakampalli Sai Balaji
Emails: (bharatk, rohitrango, devansh, sheshansh, ananya, psbalaji ) AT CSE DOT iitb DOT ac DOT in

Topics to be covered (tentative list)


Intended Audience

2nd year BTech students from CSE

Learning Materials and Textbooks

Computational Resources


Grading Policy (tenative)


Other Policies


Tutorials

Quizzes

Quiz

Lecture Schedule:


Date

Content of the Lecture

Assignments/Readings/Notes

18/07 (Wed)
  • Introduction, course overview and course policies
Descriptive Statistics
  • Descriptive statistics: key terminology
  • Methods to represent data: frequency tables, bar/line graphs, frequency polygon, pie-chart
  • Concept of frequency and relative frequency
  • Cumulative frequency plots
  • Interesting examples of histograms of intensity values in an image
  • Data summarization: mean and median
20/7 (Fri)
  • Data summarization: mean and median
  • Proofs that median minimizes the sum of absolute deviations: with and without using calculus
  • Concept of quantile/percentile
  • Standard deviation and variance, some applications
  • Two-sided Chebyshev inequality with proof; One-side Chebyshev inequality (Chebyshev-Cantelli inequality)
  • Concept of correlation coefficient and formula for it
  • Slides: Descriptive statistics
  • Readings: section 2.1, 2.2, 2.3, 2.4, 2.6 from the textbook by Sheldon Ross
  • Non-calculus proof to show that the median minimizes the sum of absolute deviations
  • 25/7 (Wed)
    • Correlation coefficient: properties; uncentered correlation coefficient; limitations of correlation coefficient and Anscombe's quartet
    • Correlation and causation
    • Proof of one-sided Chebyshev's inequality

    27/7 (Fri)
    Discrete Probability
    • Discrete probability: sample space, event, composition of events: union, intersection, complement, exclusive or, De Morgan's laws
    • Boole's and Bonferroni's inequalities
    • Conditional probability, Bayes rule, False Positive Paradox
    1/8 (Wed)
    • Conditional probability, Bayes rule, False Positive Paradox
    • Birthday paradox
    • Independent and mutually exclusive events

    Random Variables
    • Random variable: concept, discrete and continuous random variables
    • Probability mass function (pmf), cumulative distribution function (cdf) and probability density function (pdf)
    • Expected value for discrete and continuous random variables
    • Expected value of a function of a random variable
    • The mean and the median as minimizers of squared and absolute losses respectively (with proof for the former)
    • Variance and standard deviation, with alternate expressions
    3/8 (Fri)
    • Proof: The median as minimizer of absolute loss respectively
    • Markov and Chebyshev's inequality - with proof
    • Weak law of large numbers: proof using Chebyshev's inequality
    • Statement of strong law of large numbers
    • Gambler's fallacy
    • Concept of joint PMF, PDF, CDF
    • Concept of covariance, concept of mutual independence and pairwise independence
    • Properties of covariance
    8/8 (Wed)
    • Covariance: properties, correlation versus independence
    • Concept of moment generating function, two different proofs of uniqueness of moment generating function for discrete random variables, properties of moment generating functions
    • PDF/PMF of sum of random variables
    • Concept of conditional PDF, CDF, PMF; conditional expectation and variance with examples
    10/8 (Fri) Families of Random Variables
    • Concept of families of random variables
    • Bernoulli PMF: mean, median, mode, variance, MGF
    • Binomial PMF: relation to Bernoulli PMF, mean, median, mode, variance, plots, MGF, difference between binomial and geometric distribution
    • Gaussian (normal) PDF: motivation from the central limit theorem, illustration of central limit theorem
    17/8 (Fri)
    • Gaussian (normal) PDF: motivation from the central limit theorem, illustration of central limit theorem
    • Derivation of mean, variance, MGF, median, mode; CDF of a Gaussian and its relations to error functions; probability of a Gaussian random variable to have values between mu +/- k sigma.
    • Statement of central limit theorem and its extensions; proof of CLT; application of CLT and its relation to the binomial distribution - de Moivre-Laplace theorem (without proof); one application of the CLT; relation between CLT and the law of large numbers
    18/8 (Sat)
    • Gaussian tail bounds
    • Distribution of the sample mean and the sample variance, Bessel's correction;
    • Chi-squared distribution - definition, genesis, MGF, properties; use of a chi-square distribution toward defining the PDF of the sample variance
    • Uniform distribution: mean, variance, median, MGF; applications in sampling from a pre-specified PMF; application in generating a random permutation of a given set
    20/8 (Mon)
    • Poisson distribution: mean, variance, MGF, mode, addition of Poisson random variables, examples; derivation of Poisson from binomial
    • Relation between Poisson and Gaussian distributions, examples
    • Multinomial PMF - generalization of the binomial, mean vector and covariance matrix for a multinomial random variable, MGF for multinomial
    24/8 (Fri)
    • Exponential distribution: mean, median, MGF, variance, property of memorylessness, minimum of exponential random variables
    • Overview of hypergeometric distribution - capture-recapture problem in ecology

    Parameter Estimation
    • Concept of parameter estimation (or parametric PDF/PMF estimation)
    • Maximum likelihood estimation (MLE)
    • MLE for parameters of Bernoulli, Poisson, Gaussian and uniform distributions
    • Concept of estimator bias
    • Least squares line fitting as an MLE problem
    29/8 (Wed)
    • Concept of estimator bias, mean squared error, variance
    • Estimators for interval of uniform distribution: example of bias
    • Least squares line fitting as an MLE problem
    • Concept of two-sided confidence interval
    • Confidence interval for mean of a Gaussian with known standard deviation
    31/8 (Fri) Quiz
    5/9 (Wed)
    • Confidence interval for mean of a Gaussian with known standard deviation
    • Confidence interval for variance of a Gaussian
    • Approximate confidence interval for mean of Bernoulli

    • Application of MLE - capture-recapture method for counting of animals, using the hypergeometric distribution