CS 215 - Data Interpretation and Analysis

Instructor: Ajit Rajwade and Suyash Awate
Office: SIA-218, KReSIT Building
Email:

Lecture Venue: EEG-401 (Girish Gaitonde Building, 4th Floor)
Lecture Timings: Slot 10, Tuesday and Friday 2:00 to 3:25 pm

Instructor Office Hours (in room SIA-218): Monday 12 noon am to 1 pm, Thursday 12 noon to 1 pm, or after class, or by appointment via email (also feel free to send queries over email)

Teaching Assistants: Kratika Gupta, Varre Aditya Vardhan, Saurabh Garg, Vishwanadh Rapolu, Himanshu Gupta, Vihari Piratla, Shravan Kumar Telang;
Emails: ( kratikag, adityavardhan, saurabhgarg, rapoluvishu, himanshu, vihari, deadmouse ) AT cse DOT ac DOT in




Topics to be covered (tentative list)


Intended Audience

2nd year BTech students from CSE

Learning Materials and Textbooks

Computational Resources


Grading Policy (tenative)


Other Policies


Tutorials

Quizzes

Quizzes

Lecture Schedule:


Date

Content of the Lecture

Assignments/Readings/Notes

18/07 (Tue)
  • Introduction, course overview and course policies
Descriptive Statistics
  • Descriptive statistics: key terminology
  • Methods to represent data: frequency tables, bar/line graphs, frequency polygon, pie-chart
  • Concept of frequency and relative frequency
  • Cumulative frequency plots
  • Interesting examples of histograms of intensity values in an image
  • Data summarization: mean and median
21/07 (Fri)
  • Data summarization: mean and median
  • Proofs that median minimizes the sum of absolute deviations: with and without using calculus
  • Concept of quantile
  • Standard deviation and variance, some applications
  • Two-sided Chebyshev inequality with proof; One-side Chebyshev inequality (Chebyshev-Cantelli inequality)
  • Concept of correlation coefficient, proof that its value lies from -1 to +1
25/07 (Tue)
  • Correlation coefficient: properties; uncentered correlation coefficient; limitations of correlation coefficient and Anscombe's quartet
  • Correlation and causation
  • Proof of one-sided Chebyshev's inequality

MATLAB/SciLab demo
28/07 (Fri)
  • MATLAB demo: code vectorization, vector and matrix manipulation; graphical plots: plots, surface plots, boxplots, scatterplots; functions from statistics Code snippets
  • SciLab demo code

Discrete Probability
  • Discrete probability: sample space, event, composition of events: union, intersection, complement, exclusive or, De Morgan's laws
  • Boole's and Bonferroni's inequalities
  • Conditional probability, Bayes rule, False Positive Paradox
1/8 (Tue)
  • Birthday paradox in discrete probability

Random Variables
  • Random variable: concept, discrete and continuous random variables
  • Probability mass function (pmf), cumulative distribution function (cdf) and probability density function (pdf)
  • Expected value for discrete and continuous random variables
  • Expected value of a function of a random variable
  • The mean and the median as minimizers of squared and absolute losses respectively (with proof for the former)
  • Variance and standard deviation, with alternate expressions
  • Markov's and Chebyshev's inequality: with proofs
4/8 (Fri)
  • Weak law of large numbers: proof using Chebyshev's inequality
  • Statement of strong law of large numbers
  • Gambler's fallacy
  • Concept of joint PMF, PDF, CDF
  • Concept of covariance, concept of mutual independence and pairwise independence
  • Properties of covariance
8/8 (Tue)
  • Concept of conditional PDF, CDF, PMF; conditional expectation and variance with examples
  • Concept of moment generating function, two different proof of uniqueness of moment generating function for discrete random variables, properties of momenet generating functions
18/8 (Fri) Families of Random Variables
  • Bernoulli PMF: mean, median, mode, variance, MGF
  • Binomial PMF: relation to Bernoulli PMF, mean, median, mode, variance, plots, MGF, difference between binomial and geometric or negative binomial distribution
  • Gaussian (normal) PDF: motivation from the central limit theorem, illustration of central limit theorem
22/8 (Tue)
  • Gaussian PDF: derivation of MGF, median, mode; expression for CDF and its relation to the error function
  • Gaussian (normal) PDF: motivation from the central limit theorem, illustration of central limit theorem
  • Statement of central limit theorem and its extensions; proof of CLT; application of CLT and its relation to the binomial distribution - de Moivre-Laplace theorem (without proof)
  • Derivation of PDF of mean of different random variables; Bessel's correction for standard deviation
28/8 (Mon: extra)
  • PDF of sample mean and sample variance of a Gaussian
  • Chi square distribution: mean, variance, MGF; and its use towards deriving the PDF of the sample variance of a Gaussian
  • Uniform distribution: mean, variance, median, MGF; applications in sampling from a pre-specified PMF; application in generating a random permutation of a given set
  • Poisson distribution: mean, variance, MGF, mode, addition of Poisson random variables, examples; derivation of Poisson from binomial
29/8 (Tue)
  • Poisson distribution: mean, variance, MGF, mode, addition of Poisson random variables, examples; derivation of Poisson from binomial
  • Exponential distribution: relation to Poisson distribution, mean, median, CDF, MGF, memorylessness property, minimum of exponential random variables
  • Multinomial PMF - generalization of the binomial, mean vector and covariance matrix for a multinomial random variable, MGF for multinomial
29/8 (Tue: extra lecture) Parameter estimation
  • Concept of parameter estimation (or parametric PDF/PMF estimation)
  • Maximum likelihood estimation (MLE)
  • MLE for parameters of Bernoulli, Poisson, Gaussian and uniform distributions
  • Least squares line fitting as an MLE problem
  • Concept of estimator bias
1/9 (Fri)
  • Concept of estimator bias, variance and MSE, proof that MSE = variance + squared bias
  • Examples of biased estimators: variance of Gaussian when mean is unknown, parameter of uniform distribution
  • Interval estimates: two-sided confidence intervals
5/9 (Tue)
  • Quiz
  • Quiz solutions up on moodle
8/9 (Fri)
  • Interval estimates: two-sided and one-sided confidence intervals
  • Confidence interval for mean of a Gaussian with known standard deviation
  • Confidence interval for variance of a Gaussian
  • Approximate confidence interval for mean of Bernoulli
  • Application of MLE - capture-recapture method for counting of animals
  • Aside: proof (using MGFs) that the sum of two Gaussian random variables is also Gaussian distributed