1. Data and its use. The Thane taluka case-study. The main Thane
census
data. Explanation of the fields.

Exercise: loading the sheet into scilab (readxls.sci),
separating alpha-numeric data.

2. Elementary statistics. Representing data by histograms. The
mean,
variance  and its computation. The variance as a measure of
random-ness
and its implication. Examples--the bus-times and life-insurance
example.
The 2-d scatter plots.

Exercise: Computing means and variances for Thane data set.
Scatter plots
for various attributes.

3. Population and sampling. Example of sub-sampling of Thane data
and the question of (i) parameter testing and (i) hypothesis testing
problem. The need for random variables. The basics of a discrete RV: set and
probabilities. The axioms of probability.  The coin-toss and the
binomial distribution. The Poisson RV and time to failure.

Exercise: The random number generator and its use for sampling.
Coding and plotting the binomial density function.

4. The continuous RV and the density function. The uniform and the
normal random variable. Checking the density function of an unknown
source through repeated trials. The ubiquitousness of the normal RV
through examples. An informal statement of the law of large numbers.

Exercise: Repeated trials and histogram plots for uniform and
normal. The sum of repeated uniform and its closeness to normal. The plots of
Thane taluka-wise attributes and their closeness to normal. Problem of
estimation of mean and variance.

5.  Two random variables and joint probability density.
Independent RV.
Testing for independence. Sum of two RVs and convolution.
Functions of RV and expectation, variance and various identies. Examples of
binomial.

Exercise: Scatter plots of 2 variables and test for independence.
Demonstration of sum of 2 RVs. Demonstration of sum of many
identical RVs.

6. The repeated trial and its mean. The law of large numbers and
the mean and variance. The coin-toss parameter estimation problem and use
of the normal distribution. The parameter estimation and the hypothesis
testing problem. The estimation of mean with known variance. Type I and
Type II error.

Exercise: Estimation of literacy fraction of various taluka's.
Sampling and confidence and its outcomes.

7. The estimation of the variance. The estimator and its
distribution. The chi^2 density function. Confidence intervals and hypothesis
testing.

Exericse: The use of chi^2 for thane literacy.

8. The t-distribution and its use to estimate mean. Interval and
hypothesis versions.

Exercise: The use of t-distribution for small sample sizes.
Clinical and questionaire outcomes.

9. Linear regression. Statement of the problem as a minimization
problem and its solution. Error and its measurement. Some conundrums.
Connection with other linear estimations such as PCA. Higher dimensional
regression--only examples.

Exercise: Various model regressions for Thane data set.

10. Analysing a typical research paper output and presenting your
own research.




