CS 215 - Data Interpretation and Analysis

Instructor: Ajit Rajwade
Office: SIA-218, KReSIT Building
Email:

Lecture Timings: Slot 3: Monday 10:35 am to 11:30 am, Tuesday 11:35 am to 12:30 pm, Thursday 8:30 am to 9:25 am

Lecture Venue: LA 002

Instructor Office hours: Thursday 2:30 to 3:30 pm in KR 118 (or after class on Tuesdays in LA 002). (Feel free to send queries over email or moodle)

Teaching Assistants:
  • Srijan Das
  • Ayush Pratap Singh
  • Manivannan N
  • Abhay Raj
  • Mohammad Kashif Khan
  • Anirban Paul
  • Kumar Rajnish
  • Sabil Ahmad
  • Badisa Chennakesava Venkata Vignesh
  • Sameer Arvind Patil
  • S Ramachandran

Topics to be covered (tentative list)


Intended Audience

2nd year BTech students from CSE

Learning Materials and Textbooks

Computational Resources


Grading Policy (tenative)


Other Policies


Tutorials

Quizzes

Lecture Schedule:


Number

Date

Content of the Lecture

Assignments/Readings/Notes

1 28/07
  • Introduction, course overview and course policies
2 29/07
    Descriptive Statistics
  • Terminology: population, sample, discrete and continuous valued attributes
  • Frequency tables, frequency polyongs, line diagrams, pie charts, relative frequency tables
  • Histograms with examples for image intensity histograms, image gradient histograms
  • Histogram binning problem
  • Data summarization: Mean and Median
3 31/07
  • Data summarization: mean and median
  • "Proof" that median minimizes the sum of absolute deviations - using calculus
  • Proof that median minimizes the sum of absolute deviations, without using calculus
  • Concept of quantile/percentile
  • Calculation of mean and median in different ways from histogram or cumulative plots
  • Standard deviation and variance, some applications
  • Two-sided Chebyshev inequality with proof; One-side Chebyshev inequality (Chebyshev-Cantelli inequality)
4 4/8
  • Two-sided Chebyshev inequality with proof; One-side Chebyshev inequality (Chebyshev-Cantelli inequality)
  • Concept of correlation coefficient and formula for it; proof that its value lies from -1 to +1
  • Correlation coefficient: properties; uncentered correlation coefficient; limitations of correlation coefficient and Anscombe's quartet
  • Correlation and causation
5 5/8 Discrete Probability
  • Discrete probability: sample space, event, composition of events: union, intersection, complement, exclusive or, De Morgan's laws
  • Boole's and Bonferroni's inequalities
  • Conditional probability, Bayes rule, False Positive Paradox
  • Independent and mutually exclusive events
  • Birthday paradox
6 7/8
  • Independent and mutually exclusive events
  • Birthday paradox
MATLAB Tutorial
  • Code vectorization: vectors and matrix operations
  • Plotting graphs, scatterplots, images in MATLAB
  • Some functions for computing statistical quantities
7 11/8 Random Variables
  • Random variable: concept, discrete and continuous random variables
  • Probability mass function (pmf), cumulative distribution function (cdf) and probability density function (pdf)
  • Expected value for discrete and continuous random variables; Law of the Unconscious Statistician
  • Standard deviation, Markov's inequality, Chebyshev's inequality; proofs of these inequalities
  • Concept of covariance and its properties
8 12/8
  • Proof of the law of the unconscious statistician
  • Weak law of large numbers and its proof using Chebyshev's inequality; statement of strong law
9 14/8
  • Joint PMF, PDF, CDF with examples; marginals obtained by integration of joint PDFs, CDFs, PMFs
  • Concept of independence of random variables
10 18/8
  • Conditional CDF, PDF, PMF; conditional expectation; examples
  • Moment generating functions: definition, genesis, properties
11 19/8
  • Conditional CDF, PDF, PMF; conditional expectation; examples
  • Moment generating functions: properties, uniqueness proofs, connection to Laplace transforms; mention of characteristic functions
Families of Random Variables
  • Bernoulli random variables: mean, median, mode, variance, MGF
12 21/8
  • Binomial random variables: mean, median, mode, variance, MGF
13 25/8
  • Gaussian distribution: definition, mean, variance, verification of integration to 1, MGF, error functions
  • Introduction to and basic statement of the central limit theorem, with examples
14 26/8
  • Properties of Gaussian: CDF and error function, MGF
  • Relation between CLT and Law of Large Numbers
  • Gaussian tail bounds
  • Distribution of sample mean and sample variance, Bessel's correction
15 28/8
  • Proof of central limit theorem
  • Chi-square distribution
  • Distribution of sample variance given Gaussian random variables
16 1/9
  • Uniform distribution: mean, mode, median, MGF, sampling from a PMF, probability integral transform
  • Hypergeometric distribution: mean, variance
17 2/9
  • Hypergeometric distribution: method of capture+recapture in ecology
  • Multinomial distribution: mean vector, covariance matrix, MGF
18 4/9
  • Poisson distribution: genesis and examples, Poisson limit theorem, mean, variance, MGF, Poisson thinning, relation to normal distribution
  • Exponential distribution: genesis, and relevance to Poisson distribution
19 8/9
  • Exponential distribution: mean, variance, MGF, property of memorylessness
Parameter Estimation
  • Concept of parameter estimation (or parametric PDF/PMF estimation)
  • Maximum likelihood estimation (MLE)
  • MLE for parameters of Bernoulli, Poisson, Gaussian and uniform distributions
  • Least squares line fitting as an MLE problem
  • MLE for parameters of uniform distributions
20 9/9
  • Least squares line fitting as an MLE problem
  • MLE for parameters of uniform distributions
  • Concept of ML estimate as a random variable, notion of confidence interval
21 11/9
  • Concept of estimator bias, mean squared error, variance
  • Estimators for interval of uniform distribution: example of bias
  • Slides: Parameter estimation
  • MLE derivations
  • Readings: Section 5.6 from the textbook by Sheldon Ross
  • Readings: Sections 7.1, 7.2, 7.3, 7.5, 7.7, 9.2 (for least squares line fitting) of the textbook by Sheldon Ross