Tuesday, 22 January 2013

Session 3(22.01.2013)

Q1.(a) Do regression for the data set(mileage vs grooves) and comment on the applicability of regression.


> z<-read.csv(file.choose(),header=T)
> z
  Mileage Groove  X
1       0 394.33 NA
2       4 329.50 NA
3       8 291.00 NA
4      12 255.17 NA
5      16 229.33 NA
6      20 204.83 NA
7      24 179.00 NA
8      28 163.83 NA
9      32 150.33 NA
> x<-z$Mileage
> y<-z$Groove
> reg1<-lm(x~y)
> reg1

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)            y 
    47.9446      -0.1308 

> res<-resid(reg1)
> res
         1          2          3          4          5          6          7
 3.6502499 -0.8322206 -1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038
         8          9
 1.4912269  3.7248633
> plot(y,res)


> sres<-(reg1)
> sres

Call:
lm(formula = x ~ y)

Coefficients:
(Intercept)            y 
    47.9446      -0.1308 

> sres<-rstandard(reg1)
> sres
         1          2          3          4          5          6          7
 2.0960030 -0.3763146 -0.7964870 -1.0654888 -0.8084405 -0.4840129 -0.2284168
         8          9
 0.6674135  1.7170098
> plot(y,sres)



> qqnorm(res)
> qqline(res)

Result: Since the residual plot is not random and shows a parabolic pattern , it is non-linear and thus we cannot go for linear regression.

Q1.(b)Do regression for the data set(alpha vs pluto) and comment on the applicability of regression.

> z<-read.csv(file.choose(),header=T)
> z
   Alpha Pluto
1  0.150    20
2  0.004     0
3  0.069    10
4  0.030     5
5  0.011     0
6  0.004     0
7  0.041     5
8  0.109    20
9  0.068    10
10 0.009     0
11 0.009     0
12 0.048    10
13 0.006     0
14 0.083    20
15 0.037     5
16 0.039     5
17 0.132    20
18 0.004     0
19 0.006     0
20 0.059    10
21 0.051    10
22 0.002     0
23 0.049     5
24 0.049     5
> x<-z$Alpha
> y<-Pluto
Error: object 'Pluto' not found
> y<-z$Pluto
> reg1<-lm(y~x)
> summary(reg1)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-4.0830 -0.8682 -0.3614  0.4530  6.9820

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -0.6893     0.6627   -1.04     0.31   
x           165.1490    10.9977   15.02  4.8e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.187 on 22 degrees of freedom
Multiple R-squared: 0.9111,     Adjusted R-squared: 0.9071
F-statistic: 225.5 on 1 and 22 DF,  p-value: 4.804e-13

> res<-resid(reg1)
> res
          1           2           3           4           5           6
-4.08300561  0.02874929 -0.70593611  0.73487513 -1.12729375  0.02874929
          7           8           9          10          11          12
-1.08176394  2.68810364 -0.54078710 -0.79699574 -0.79699574  2.76219302
         13          14          15          16          17          18
-0.30154872  6.98197780 -0.42116791 -0.75146592 -1.11032350  0.02874929
         19          20          21          22          23          24
-0.30154872  0.94555395  2.26674600  0.35904730 -2.40295599 -2.40295599
> plot(x,res)

> sres<-rstandard(reg1)
> sres
          1           2           3           4           5           6
-2.26945649  0.01373201 -0.33242731  0.34427383 -0.53463647  0.01373201
          7           8           9          10          11          12
-0.50545136  1.33090528 -0.25449470 -0.37869994 -0.37869994  1.29061750
         13          14          15          16          17          18
-0.14372047  3.32737186 -0.19690489 -0.35120473 -0.58062905  0.01373201
         19          20          21          22          23          24
-0.14372047  0.44295830  1.05953921  0.17189232 -1.12288360 -1.12288360
> plot(x,sres)

> qqnorm(res)
> qqline(res)

Result: The plot of residuals is random in nature and thus indication of linearity and thus can go for linear regression. Also the plot of qqnorm has points around the straight qq line showing normal distribution of residuals.

Q2. Justify null hypothesis showing the similarity of 3 types of chairs using ANOVA.

> z<-read.csv(file.choose(),header=T)
> z
   Chair Comfort Chair1
1      1       2      a
2      1       3      a
3      1       5      a
4      1       3      a
5      1       2      a
6      1       3      a
7      2       5      b
8      2       4      b
9      2       5      b
10     2       4      b
11     2       1      b
12     2       3      b
13     3       3      c
14     3       4      c
15     3       4      c
16     3       5      c
17     3       1      c
18     3       2      c
> x<-z$Comfort
> z
   Chair Comfort Chair1
1      1       2      a
2      1       3      a
3      1       5      a
4      1       3      a
5      1       2      a
6      1       3      a
7      2       5      b
8      2       4      b
9      2       5      b
10     2       4      b
11     2       1      b
12     2       3      b
13     3       3      c
14     3       4      c
15     3       4      c
16     3       5      c
17     3       1      c
18     3       2      c
> x
 [1] 2 3 5 3 2 3 5 4 5 4 1 3 3 4 4 5 1 2
> y<-z$Chair1
> anova<-aov(x~y)
> summary(anova)
            Df Sum Sq Mean Sq F value Pr(>F)
y            2  1.444  0.7222   0.385  0.687
Residuals   15 28.167  1.8778              

Conclusion: p value: 0.687
Since the p value is very high we cannot reject the null hypothesis, thus we can say that all the type of chairs are not different.




Tuesday, 8 January 2013

ITBAL-1st Lecture


Assignment 0: Plot histogram for a simple vector 

Assignment1: Plot a histogram from "High" column of the NSE indices data.


Assignment 2: Plot the point and line graph of the "high, along with naming of the graph and axis.

                          Assignment3: Draw a scatterplot for two columns of the NSE indices data.


Assignment4: To find the volatility in the data.