Sunday, 24 March 2013

Infographics

Visual.ly is a one-stop shop for the creation of data visualizations and infographics, bringing together various persons based on shared interests.

This tool fetches data from certain period of activities happened so far.

As a part of this assignment, I came across many top sites which help create a good infographic resume. I came across a plethora of such sites but i chose visual.ly for a detailed study.

For those of us who wants to design a different looking resume from others, it helps a lot.

I have gone through this and build a different looking resume.

The steps to be followed to create resume through visul.ly

1 Go to the following link http://visual.ly/

2 Click on the create option à http://create.visual.ly/

3 I chose resume by Kelly

4 I chose Helen Wheels, black gradient to create my resume

5 I uploaded my details from my LinkedIn profile à http://www.linkedin.com

Pros:

- Allows choosing between 4-5 themes.

- Options to tweet, share on FB, Pin and share on other social media sites

- Provides option to download as PDF, mail to your email ID.

- Easy Accessibility.

- Different gradient versions.

- Ease of data access, no need to edit/enter any data.

Cons -:

- Doesn't allow playing around with the format of the resume.

- Less options to customise the graphics.

- Limited number of themes to select.

Friday, 15 March 2013

FRIDAY, 15 MARCH 2013

IT Lab session 8

We will be doing Panel Data Analysis of "Produc" data

We will be analysing on three types of model :
      Pooled affect model
      Fixed affect model
      Random affect model

Then we will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

Commands:

Loading data:
> data(Produc, package ="plm")
> head(Produc)

Pooled Affect Model

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)

Fixed Affect Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))

> summary(fixed)

Random Affect Model:

> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))

> summary(random)

Comparison

The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero

H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis : Fixed Affect Model

Command:

> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Pooled vs Random

Null Hypothesis: Pooled Affect Model

Alternate Hypothesis: Random Affect Model

Command :

> plmtest(pool)

Result:

Lagrange Multiplier Test - (Honda)

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Affect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model

Alternate Hypothesis: Fixed Affect Model

Command:

> phtest(fixed,random)

Result:

Hausman Test

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)

chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Affect Model.

Conclusion:

So after making all the comparisons we come to the conclusion that Fixed Affect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.

Thursday, 14 February 2013

IT LAB ASSIGNMENT 6

Assignment 1: Find the historical volatility and log of returns data

Assignment 2: Create ACF plot of log returns and do Augmented Dickey-Fuller test

Thursday, 7 February 2013

Assignment 5- session 5

Assignment1: To find and plot returns for NSE data of more than months.

sol:

> z<-read.csv(file.choose(),header=T)
> head(z)
Date Open High Low Close Shares.Traded Turnover..Rs..Cr.
1 02-Jul-2012 5283.85 5302.15 5263.35 5278.60 126161441 4991.57
2 03-Jul-2012 5298.85 5317. 00 5265.95 5287.95 133117055 5161.82
3 04-Jul-2012 5310.40 5317.65 5273.30 5302.55 155995887 5750.10
4 05-Jul-2012 5297.05 5333.65 5288.85 5327.30 118915392 4709.79
5 06-Jul-2012 5324.70 5327.20 5287.75 5316.95 113300726 4760.51
6 09-Jul-2012 5283.70 5300.60 5257.75 5275.15 101169926 4189.25
> open<-z$Open[10:95]
> open.ts<-ts(open,deltat=1/252)
> open.ts
Time Series:
Start = c(1, 1)
End = c(1, 86)
Frequency = 252
[1] 5242.75 5232.35 5228.05 5199.10 5249.85 5233.55 5163.25 5128.80 5118.40
[10] 5126.30 5124.30 5129.75 5214.85 5220.70 5233.10 5195.60 5260.85 5295.40
[19] 5345.25 5348.30 5308.20 5316.35 5343.25 5385.95 5368.60 5368.70 5395.75
[28] 5426.15 5392.60 5387.85 5348.05 5343.85 5268.60 5298.20 5276.50 5249.15
[37] 5243.90 5217.65 5309.45 5343.65 5361.90 5336.10 5404.45 5435.20 5528.35
[46] 5631.75 5602.40 5536.95 5577.00 5691.95 5674.90 5653.40 5673.75 5684.80
[55] 5704.75 5727.70 5751.55 5815.00 5751.85 5708.15 5671.15 5663.50 5681.70
[64] 5674.25 5705.60 5681.10 5675.30 5703.30 5667.60 5715.65 5688.80 5683.55
[73] 5665.20 5656.35 5596.75 5609.85 5696.35 5693.05 5694.10 5718.60 5709.00
[82] 5731.10 5688.45 5689.70 5650.35 5624.80
> summary(open.ts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5118 5281 5431 5474 5682 5815
> z.diff<-diff(open.ts)
> z.diff
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
[1] -10.40 -4.30 -28.95 50.75 -16.30 -70.30 -34.45 -10.40 7.90 -2.00
[11] 5.45 85.10 5.85 12.40 -37.50 65.25 34.55 49.85 3.05 -40.10
[21] 8.15 26.90 42.70 -17.35 0.10 27.05 30.40 -33.55 -4.75 -39.80
[31] -4.20 -75.25 29.60 -21.70 -27.35 -5.25 -26.25 91.80 34.20 18.25
[41] -25.80 68.35 30.75 93.15 103.40 -29.35 -65.45 40.05 114.95 -17.05
[51] -21.50 20.35 11.05 19.95 22.95 23.85 63.45 -63.15 -43.70 -37.00
[61] -7.65 18.20 -7.45 31.35 -24.50 -5.80 28.00 -35.70 48.05 -26.85
[71] -5.25 -18.35 -8.85 -59.60 13.10 86.50 -3.30 1.05 24.50 -9.60
[81] 22.10 -42.65 1.25 -39.35 -25.55
> returns<-cbind(open.ts,z.diff,lag(open.ts,k=-1))
> returns
Time Series:
Start = c(1, 1)
End = c(1, 87)
Frequency = 252
open.ts z.diff lag(open.ts, k = -1)
1.000000 5242.75 NA NA
1.003968 5232.35 -10.40 5242.75
1.007937 5228.05 -4.30 5232.35
1.011905 5199.10 -28.95 5228.05
1.015873 5249.85 50.75 5199.10
1.019841 5233.55 -16.30 5249.85
1.023810 5163.25 -70.30 5233.55
1.027778 5128.80 -34.45 5163.25
1.031746 5118.40 -10.40 5128.80
1.035714 5126.30 7.90 5118.40
1.039683 5124.30 -2.00 5126.30
1.043651 5129.75 5.45 5124.30
1.047619 5214.85 85.10 5129.75
1.051587 5220.70 5.85 5214.85
1.055556 5233.10 12.40 5220.70
1.059524 5195.60 -37.50 5233.10
1.063492 5260.85 65.25 5195.60
1.067460 5295.40 34.55 5260.85
1.071429 5345.25 49.85 5295.40
1.075397 5348.30 3.05 5345.25
1.079365 5308.20 -40.10 5348.30
1.083333 5316.35 8.15 5308.20
1.087302 5343.25 26.90 5316.35
1.091270 5385.95 42.70 5343.25
1.095238 5368.60 -17.35 5385.95
1.099206 5368.70 0.10 5368.60
1.103175 5395.75 27.05 5368.70
1.107143 5426.15 30.40 5395.75
1.111111 5392.60 -33.55 5426.15
1.115079 5387.85 -4.75 5392.60
1.119048 5348.05 -39.80 5387.85
1.123016 5343.85 -4.20 5348.05
1.126984 5268.60 -75.25 5343.85
1.130952 5298.20 29.60 5268.60
1.134921 5276.50 -21.70 5298.20
1.138889 5249.15 -27.35 5276.50
1.142857 5243.90 -5.25 5249.15
1.146825 5217.65 -26.25 5243.90
1.150794 5309.45 91.80 5217.65
1.154762 5343.65 34.20 5309.45
1.158730 5361.90 18.25 5343.65
1.162698 5336.10 -25.80 5361.90
1.166667 5404.45 68.35 5336.10
1.170635 5435.20 30.75 5404.45
1.174603 5528.35 93.15 5435.20
1.178571 5631.75 103.40 5528.35
1.182540 5602.40 -29.35 5631.75
1.186508 5536.95 -65.45 5602.40
1.190476 5577.00 40.05 5536.95
1.194444 5691.95 114.95 5577.00
1.198413 5674.90 -17.05 5691.95
1.202381 5653.40 -21.50 5674.90
1.206349 5673.75 20.35 5653.40
1.210317 5684.80 11.05 5673.75
1.214286 5704.75 19.95 5684.80
1.218254 5727.70 22.95 5704.75
1.222222 5751.55 23.85 5727.70
1.226190 5815.00 63.45 5751.55
1.230159 5751.85 -63.15 5815.00
1.234127 5708.15 -43.70 5751.85
1.238095 5671.15 -37.00 5708.15
1.242063 5663.50 -7.65 5671.15
1.246032 5681.70 18.20 5663.50
1.250000 5674.25 -7.45 5681.70
1.253968 5705.60 31.35 5674.25
1.257937 5681.10 -24.50 5705.60
1.261905 5675.30 -5.80 5681.10
1.265873 5703.30 28.00 5675.30
1.269841 5667.60 -35.70 5703.30
1.273810 5715.65 48.05 5667.60
1.277778 5688.80 -26.85 5715.65
1.281746 5683.55 -5.25 5688.80
1.285714 5665.20 -18.35 5683.55
1.289683 5656.35 -8.85 5665.20
1.293651 5596.75 -59.60 5656.35
1.297619 5609.85 13.10 5596.75
1.301587 5696.35 86.50 5609.85
1.305556 5693.05 -3.30 5696.35
1.309524 5694.10 1.05 5693.05
1.313492 5718.60 24.50 5694.10
1.317460 5709.00 -9.60 5718.60
1.321429 5731.10 22.10 5709.00
1.325397 5688.45 -42.65 5731.10
1.329365 5689.70 1.25 5688.45
1.333333 5650.35 -39.35 5689.70
1.337302 5624.80 -25.55 5650.35
1.341270 NA NA 5624.80
> plot(returns)
> returns<-z.diff/lag(open.ts,k=-1)
> returns
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
[1] -1.983692e-03 -8.218105e-04 -5.537437e-03 9.761305e-03 -3.104851e-03
[6] -1.343256e-02 -6.672154e-03 -2.027765e-03 1.543451e-03 -3.901449e-04
[11] 1.063560e-03 1.658950e-02 1.121796e-03 2.375160e-03 -7.165925e-03
[16] 1.255870e-02 6.567380e-03 9.413831e-03 5.706001e-04 -7.497710e-03
[21] 1.535360e-03 5.059862e-03 7.991391e-03 -3.221344e-03 1.862683e-05
[26] 5.038464e-03 5.634064e-03 -6.183021e-03 -8.808367e-04 -7.386991e-03
[31] -7.853330e-04 -1.408161e-02 5.618191e-03 -4.095731e-03 -5.183360e-03
[36] -1.000162e-03 -5.005816e-03 1.759413e-02 6.441345e-03 3.415269e-03
[41] -4.811727e-03 1.280898e-02 5.689756e-03 1.713828e-02 1.870359e-02
[46] -5.211524e-03 -1.168249e-02 7.233224e-03 2.061144e-02 -2.995458e-03
[51] -3.788613e-03 3.599604e-03 1.947566e-03 3.509358e-03 4.022963e-03
[56] 4.163975e-03 1.103181e-02 -1.085985e-02 -7.597556e-03 -6.481960e-03
[61] -1.348933e-03 3.213561e-03 -1.311227e-03 5.524959e-03 -4.294027e-03
[66] -1.020929e-03 4.933660e-03 -6.259534e-03 8.478015e-03 -4.697628e-03
[71] -9.228660e-04 -3.228616e-03 -1.562169e-03 -1.053683e-02 2.340644e-03
[76] 1.541931e-02 -5.793183e-04 1.844354e-04 4.302699e-03 -1.678733e-03
[81] 3.871081e-03 -7.441852e-03 2.197435e-04 -6.916006e-03 -4.521844e-03

> plot(returns)

Assignment 2: Do logit analysis for 700 data points and then predict for 150 data points.

sol:

z<-read.csv(file.choose(),header=T)

head(z)

z.data<-z[1:700,1:9]

sapply(z.data,mean)

z.data$ed<-factor(z.data$ed)

logit.est<-glm(default~age+employ+address+income+debtinc+creddebt+othdebt,data=z.data,family="binomial")

summary(logit.est)

confint.default(logit.est)

logit.eg2<-with(z[701:850,1:8],data.frame(age=mean(age),employ=mean(employ),address=mean(address),income=mean(income),debtinc=mean(debtinc),creddebt=mean(creddebt),othdebt=mean(othdebt),ed=factor(1:3)))

logit.eg2$prob<-predict(logit.est,newdata=logit.eg2,type="response")

head(logit.eg2)

Tuesday, 22 January 2013

ASSIGNMENT 1a:

Fit ‘lm’ and comment on the applicability of ‘lm’

Plot1: Residual vs Independent curve

Plot2: Standard Residual vs independent curve

> file<-read.csv(file.choose(),header=T)

> file

mileage groove

1 0 394.33

2 4 329.50

3 8 291.00

4 12 255.17

5 16 229.33

6 20 204.83

7 24 179.00

8 28 163.83

9 32 150.33

> x<-file$groove

> x

[1] 394.33 329.50 291.00 255.17 229.33 204.83 179.00 163.83 150.33

> y<-file$mileage

> y

[1] 0 4 8 12 16 20 24 28 32

> reg1<-lm(y~x)

> res<-resid(reg1)

> res

1 2 3 4 5 6 7 8 9

3.6502499 -0.8322206 -1.8696280 -2.5576878 -1.9386386 -1.1442614 -0.5239038 1.4912269 3.7248633

> plot(x,res)

Assignment 1 (b) -Alpha-Pluto Data

Fit ‘lm’ and comment on the applicability of ‘lm’

Plot1: Residual vs Independent curve

Plot2: Standard Residual vs independent curve

Also do:

Qq plot

Qqline

> file<-read.csv(file.choose(),header=T)

> file

alpha pluto

1 0.150 20

2 0.004 0

3 0.069 10

4 0.030 5

5 0.011 0

6 0.004 0

7 0.041 5

8 0.109 20

9 0.068 10

10 0.009 0

11 0.009 0

12 0.048 10

13 0.006 0

14 0.083 20

15 0.037 5

16 0.039 5

17 0.132 20

18 0.004 0

19 0.006 0

20 0.059 10

21 0.051 10

22 0.002 0

23 0.049 5

> x<-file$alpha

> y<-file$pluto

> x

[1] 0.150 0.004 0.069 0.030 0.011 0.004 0.041 0.109 0.068 0.009 0.009 0.048

[13] 0.006 0.083 0.037 0.039 0.132 0.004 0.006 0.059 0.051 0.002 0.049

> y

[1] 20 0 10 5 0 0 5 20 10 0 0 10 0 20 5 5 20 0 0 10 10 0 5

> reg1<-lm(y~x)

> res<-resid(reg1)

> res

1 2 3 4 5 6 7

-4.2173758 -0.0643108 -0.8173877 0.6344584 -1.2223345 -0.0643108 -1.1852930

8 9 10 11 12 13 14

2.5653342 -0.6519557 -0.8914706 -0.8914706 2.6566833 -0.3951747 6.8665650

15 16 17 18 19 20 21

-0.5235652 -0.8544291 -1.2396007 -0.0643108 -0.3951747 0.8369318 2.1603874

22 23

0.2665531 -2.5087486

> plot(x,res)

Assignment 2: Justify Null Hypothesis using ANOVA

> file<-read.csv(file.choose(),header=T)

> file

Chair Comfort.Level Chair1

1 I 2 a

2 I 3 a

3 I 5 a

4 I 3 a

5 I 2 a

6 I 3 a

7 II 5 b

8 II 4 b

9 II 5 b

10 II 4 b

11 II 1 b

12 II 3 b

13 III 3 c

14 III 4 c

15 III 4 c

16 III 5 c

17 III 1 c

18 III 2 c

> file.anova<-aov(file$Comfort.Level~file$Chair1)

> summary(file.anova)

Df Sum Sq Mean Sq F value Pr(>F)

file$Chair1 2 1.444 0.7222 0.385 0.687