Lakshmi Priya Vijay

Saturday, 30 March 2013

IT and Business Applications Lab - Day9 Assignments

Assignment 1: Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length,
T<- cbind(x,y,z)
Create 3 dimensional plot of the same (of all the 3 types as taught)

Commands :

3D plots:

Normal Plot: plot3d(T[, 1:3])

Colour Plot: plot3d(T[, 1:3], col = rainbow(1000))

Color Plot of spheres: plot3d(T[, 1:3], col = rainbow(1000), type = 's')

Assignment 2:

Choose 2 random variables
Create 3 plots:
1. X-Y
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph
4. Smooth and best fit line for the curve

>qplot(x,y)

>qplot(x,z)

Semi-transparent plot

> qplot(x,z, alpha=I(2/10))

Colour plot

> qplot(x,y, color=z)

Logarithmic colour plot

> qplot(log(x),log(y), color=z)

Best Fit and Smooth curve using the function "geom"

> qplot(x,y,geom=c("path","smooth"))

> qplot(x,y,geom=c("point","smooth"))

> qplot(x,y,geom=c("boxplot","jitter"))

Saturday, 23 March 2013

IT and Business Applications Lab - Day8 Assignments

Data Visualization

Visualization can be defined as the graphical representation of information, aimed at providing the viewer with a qualitative understanding of the contents of the information. This may be data, processes, relations, or concepts. Graphical presentation may entail manipulation of graphical entities (points, lines, shapes, images, text) and attributes (color, size, position, shape). Understanding may involve detection, measurement, and comparison, and is enhanced via interactive techniques and provision of the information from multiple views and with multiple techniques.

Why go for data visualization?

Effective: the viewer gets it (ease of interpretation)
Accurate: sufficient for correct quantitative evaluation. Lie factor = size of visual effect/size of data effect
Efficient: minimize data-ink ratio and chart-junk, show data, maximize data-ink ratio, brase non-data-ink, brase redundant data-ink
Aesthetics: must not offend viewer's senses (e.g. moire patterns)
Adaptable: can adjust to serve multiple needs

Wolfram Alpha

The Wolfram Alpha is in the trend these days as the latest search engine. Well, it is not just a search engine, a "computational knowledge engine". While the conceptualization of the this search engine, the creators of Wolfram Alpha were keen to make something better than Google. These include mathematical graphs, unit conversions, historical events, and scientific formulas in the general searches.

Now, what is it that make Wolfram Alpha better than other visualization tools. Well, the Wolfram Alpha is better than Google as it doesn't just search for web pages but, it gives the exact answers to the vast and ever expanding collection of knowledge and allows the user to discover so much more.

As managers, we are interested in Business Research ie; using data to aid business decision making. Wolfram Alpha is an excellent data analysis tool.

Wolfram Alpha’s Facebook Analytics Tool

Who’s your oldest friend on Facebook? What’s the most popular photo you've ever posted? When are you most likely to be found updating your status?

I actually know the answers to those questions—and a lot of other arcane facts about my life in social-space, thanks to a new feature on Wolfram Alpha. That’s the ultra-geeky search engine that specializes in providing factual responses to questions rather than directing you to relevant web pages.

My Data Visualization Project - Wolfram Alpha using Facebook data :

This is a video that takes you through the various analysis results that have been made using my facebook data as the input.

Other Features:

Interactively manipulate parameters: Move controls and sliders to fully interact with and understand your results.
Animate dynamic processes: Input physical system parameters and see the associated animation evolve in time.
Rotate 3D graphics: Click and drag 3D images to rotate and view from any angle.
Control enhanced visualizations: Click to reveal special controls for many types of visualizations.
nstantly build interactive interfaces: Use Wolfram|Alpha linguistics to create custom interactive interfaces.
Resize images: Click maps, plots, or any image and drag the corner to expand the image size.
Read plot values: Interact with plots with dynamic coordinate labels and crosshairs on the axes.

In other words, there’s little chance that the gatekeepers of our digital lives will stop compiling minutiae about us. But we can hope for—or insist upon—more chances to look at that data ourselves and hopefully gain some insights. In that sense, Wolfram Alpha’s new Facebook reports are an instructive example. And hey, who wouldn’t want to learn more about how to get their friends’ attention?

Friday, 15 March 2013

IT and Business Applications Lab - Day7 Assignments

Panel Data Analysis - "Produc" data

Here we analyze three types of model :

Pooled effect model
Fixed effect model
Random effect model

Then we will determine the best model by using functions:

pFtest : for determining between fixed and pooled
plmtest : for determining between pooled and random
phtest: for determining between random and fixed

Code :

> data(Produc , package ="plm")
> head(Produc)

Data

Pooled Effect Model

> pool <- br="" data="Produc," emp="" gsp="" hwy="" index="c(" log="" model="(" pc="" pcap="" plm="" pooling="" state="" unemp="" util="" water="" year=""> > summary(pool)

Fixed Effect Model:

> fixed <- data="Produc," emp="" gsp="" hwy="" index="c(" log="" model="(" pc="" pcap="" plm="" span="" state="" unemp="" util="" water="" within="" year="">
> summary(fixed)

Random Effect Model:
> random <- data="Produc," emp="" gsp="" hwy="" index="c(" log="" model="(" pc="" pcap="" plm="" random="" span="" state="" unemp="" util="" water="" year="">
> summary(random)

The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

H0: Pooled Effect Model
H1 : Fixed Effect Model

Code:
> pFtest(fixed,pool)

Result:

data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Effect Model.

Random vs Fixed

Null Hypothesis: No Correlation . Random Effect Model
Alternate Hypothesis: Fixed Effect Model

Command:
> phtest(fixed,random)

Result:

Hausman Test
data: log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Effect Model.

Inference:

We infer that Fixed effect model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no difference.

Wednesday, 13 February 2013

IT and Business Applications Lab - Day6 Assignments

Assignment 1: Create a log of returns data and calculate its historical volatility. Create ACF plot for log returns, do ADF test and analyse it.

We have the formula, (log St - log St-1 )/ log St-1

Commands:

data<-read.csv(file.choose(),header=T)
close<-data$Close
close.ts<-ts(close,frequency=252)
closeshift.ts<-lag(close.ts,k=-1)
numerator<-log(close.ts)-log(closeshift.ts)
numerator
returns<-numerator/log(closeshift.ts)
plot(returns,main="Log Returns;NIFTY 1 Jan 2012 to 31 Jan 2013")
acf(returns,main="Auto Correlation Function on log returns")
adf.test(returns)
T<-252^0.5
histvol<-sd(returns)/T
histvol

Plots :

1. Values

2. log returns:

3. ACF plot of log returns

Autocorrelation computes the correlation between different time steps(lags) within the same variable. Since the correlation measurements lie within the confidence interval of 95% (the 2 dotted lines) and pattern exists in the correlation, time series is stationary.

Asgmt 2:
Create ACF plot for the above log of returns data and perform the adf test and comment on it

The ACF plot can be done using the below formula

acf(log.returns)

ADF test and Historical Volatility:

Confidence interval = 95%
implies, Alpha = 0.05
p-value obtained after ADF test = 0.01 which is < alpha
Hence, we reject the null hypothesis.

Thursday, 7 February 2013

IT and Business Applications Lab - Day5 Assignments

Assignment 1: To find and plot the returns for NSE data for more than months.

Solution :

> z<-read.csv(file.choose(),header=T)
> head(z)
Date Open High Low Close Shares.Traded Turnover..Rs..Cr.
1 02-Jul-2012 5283.85 5302.15 5263.35 5278.60 126161441 4991.57
2 03-Jul-2012 5298.85 5317.00 5265.95 5287.95 133117055 5161.82
3 04-Jul-2012 5310.40 5317.65 5273.30 5302.55 155995887 5750.10
4 05-Jul-2012 5297.05 5333.65 5288.85 5327.30 118915392 4709.79
5 06-Jul-2012 5324.70 5327.20 5287.75 5316.95 113300726 4760.51
6 09-Jul-2012 5283.70 5300.60 5257.75 5275.15 101169926 4189.25
> open<-z$Open[10:95]
> open.ts<-ts(open,deltat=1/252)
> open.ts
Time Series:
Start = c(1, 1)
End = c(1, 86)
Frequency = 252
[1] 5242.75 5232.35 5228.05 5199.10 5249.85 5233.55 5163.25 5128.80 5118.40
[10] 5126.30 5124.30 5129.75 5214.85 5220.70 5233.10 5195.60 5260.85 5295.40
[19] 5345.25 5348.30 5308.20 5316.35 5343.25 5385.95 5368.60 5368.70 5395.75
[28] 5426.15 5392.60 5387.85 5348.05 5343.85 5268.60 5298.20 5276.50 5249.15
[37] 5243.90 5217.65 5309.45 5343.65 5361.90 5336.10 5404.45 5435.20 5528.35
[46] 5631.75 5602.40 5536.95 5577.00 5691.95 5674.90 5653.40 5673.75 5684.80
[55] 5704.75 5727.70 5751.55 5815.00 5751.85 5708.15 5671.15 5663.50 5681.70
[64] 5674.25 5705.60 5681.10 5675.30 5703.30 5667.60 5715.65 5688.80 5683.55
[73] 5665.20 5656.35 5596.75 5609.85 5696.35 5693.05 5694.10 5718.60 5709.00
[82] 5731.10 5688.45 5689.70 5650.35 5624.80
> summary(open.ts)
Min. 1st Qu. Median Mean 3rd Qu. Max.
5118 5281 5431 5474 5682 5815
> z.diff<-diff(open.ts)
> z.diff
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
[1] -10.40 -4.30 -28.95 50.75 -16.30 -70.30 -34.45 -10.40 7.90 -2.00
[11] 5.45 85.10 5.85 12.40 -37.50 65.25 34.55 49.85 3.05 -40.10
[21] 8.15 26.90 42.70 -17.35 0.10 27.05 30.40 -33.55 -4.75 -39.80
[31] -4.20 -75.25 29.60 -21.70 -27.35 -5.25 -26.25 91.80 34.20 18.25
[41] -25.80 68.35 30.75 93.15 103.40 -29.35 -65.45 40.05 114.95 -17.05
[51] -21.50 20.35 11.05 19.95 22.95 23.85 63.45 -63.15 -43.70 -37.00
[61] -7.65 18.20 -7.45 31.35 -24.50 -5.80 28.00 -35.70 48.05 -26.85
[71] -5.25 -18.35 -8.85 -59.60 13.10 86.50 -3.30 1.05 24.50 -9.60
[81] 22.10 -42.65 1.25 -39.35 -25.55
> returns<-cbind(open.ts,z.diff,lag(open.ts,k=-1))
> returns
Time Series:
Start = c(1, 1)
End = c(1, 87)
Frequency = 252
open.ts z.diff lag(open.ts, k = -1)
1.000000 5242.75 NA NA
1.003968 5232.35 -10.40 5242.75
1.007937 5228.05 -4.30 5232.35
1.011905 5199.10 -28.95 5228.05
1.015873 5249.85 50.75 5199.10
1.019841 5233.55 -16.30 5249.85
1.023810 5163.25 -70.30 5233.55
1.027778 5128.80 -34.45 5163.25
1.031746 5118.40 -10.40 5128.80
1.035714 5126.30 7.90 5118.40
1.039683 5124.30 -2.00 5126.30
1.043651 5129.75 5.45 5124.30
1.047619 5214.85 85.10 5129.75
1.051587 5220.70 5.85 5214.85
1.055556 5233.10 12.40 5220.70
1.059524 5195.60 -37.50 5233.10
1.063492 5260.85 65.25 5195.60
1.067460 5295.40 34.55 5260.85
1.071429 5345.25 49.85 5295.40
1.075397 5348.30 3.05 5345.25
1.079365 5308.20 -40.10 5348.30
1.083333 5316.35 8.15 5308.20
1.087302 5343.25 26.90 5316.35
1.091270 5385.95 42.70 5343.25
1.095238 5368.60 -17.35 5385.95
1.099206 5368.70 0.10 5368.60
1.103175 5395.75 27.05 5368.70
1.107143 5426.15 30.40 5395.75
1.111111 5392.60 -33.55 5426.15
1.115079 5387.85 -4.75 5392.60
1.119048 5348.05 -39.80 5387.85
1.123016 5343.85 -4.20 5348.05
1.126984 5268.60 -75.25 5343.85
1.130952 5298.20 29.60 5268.60
1.134921 5276.50 -21.70 5298.20
1.138889 5249.15 -27.35 5276.50
1.142857 5243.90 -5.25 5249.15
1.146825 5217.65 -26.25 5243.90
1.150794 5309.45 91.80 5217.65
1.154762 5343.65 34.20 5309.45
1.158730 5361.90 18.25 5343.65
1.162698 5336.10 -25.80 5361.90
1.166667 5404.45 68.35 5336.10
1.170635 5435.20 30.75 5404.45
1.174603 5528.35 93.15 5435.20
1.178571 5631.75 103.40 5528.35
1.182540 5602.40 -29.35 5631.75
1.186508 5536.95 -65.45 5602.40
1.190476 5577.00 40.05 5536.95
1.194444 5691.95 114.95 5577.00
1.198413 5674.90 -17.05 5691.95
1.202381 5653.40 -21.50 5674.90
1.206349 5673.75 20.35 5653.40
1.210317 5684.80 11.05 5673.75
1.214286 5704.75 19.95 5684.80
1.218254 5727.70 22.95 5704.75
1.222222 5751.55 23.85 5727.70
1.226190 5815.00 63.45 5751.55
1.230159 5751.85 -63.15 5815.00
1.234127 5708.15 -43.70 5751.85
1.238095 5671.15 -37.00 5708.15
1.242063 5663.50 -7.65 5671.15
1.246032 5681.70 18.20 5663.50
1.250000 5674.25 -7.45 5681.70
1.253968 5705.60 31.35 5674.25
1.257937 5681.10 -24.50 5705.60
1.261905 5675.30 -5.80 5681.10
1.265873 5703.30 28.00 5675.30
1.269841 5667.60 -35.70 5703.30
1.273810 5715.65 48.05 5667.60
1.277778 5688.80 -26.85 5715.65
1.281746 5683.55 -5.25 5688.80
1.285714 5665.20 -18.35 5683.55
1.289683 5656.35 -8.85 5665.20
1.293651 5596.75 -59.60 5656.35
1.297619 5609.85 13.10 5596.75
1.301587 5696.35 86.50 5609.85
1.305556 5693.05 -3.30 5696.35
1.309524 5694.10 1.05 5693.05
1.313492 5718.60 24.50 5694.10
1.317460 5709.00 -9.60 5718.60
1.321429 5731.10 22.10 5709.00
1.325397 5688.45 -42.65 5731.10
1.329365 5689.70 1.25 5688.45
1.333333 5650.35 -39.35 5689.70
1.337302 5624.80 -25.55 5650.35
1.341270 NA NA 5624.80
> returns<-z.diff/lag(open.ts,k=-1)
> returns
Time Series:
Start = c(1, 2)
End = c(1, 86)
Frequency = 252
[1] -1.983692e-03 -8.218105e-04 -5.537437e-03 9.761305e-03 -3.104851e-03
[6] -1.343256e-02 -6.672154e-03 -2.027765e-03 1.543451e-03 -3.901449e-04
[11] 1.063560e-03 1.658950e-02 1.121796e-03 2.375160e-03 -7.165925e-03
[16] 1.255870e-02 6.567380e-03 9.413831e-03 5.706001e-04 -7.497710e-03
[21] 1.535360e-03 5.059862e-03 7.991391e-03 -3.221344e-03 1.862683e-05
[26] 5.038464e-03 5.634064e-03 -6.183021e-03 -8.808367e-04 -7.386991e-03
[31] -7.853330e-04 -1.408161e-02 5.618191e-03 -4.095731e-03 -5.183360e-03
[36] -1.000162e-03 -5.005816e-03 1.759413e-02 6.441345e-03 3.415269e-03
[41] -4.811727e-03 1.280898e-02 5.689756e-03 1.713828e-02 1.870359e-02
[46] -5.211524e-03 -1.168249e-02 7.233224e-03 2.061144e-02 -2.995458e-03
[51] -3.788613e-03 3.599604e-03 1.947566e-03 3.509358e-03 4.022963e-03
[56] 4.163975e-03 1.103181e-02 -1.085985e-02 -7.597556e-03 -6.481960e-03
[61] -1.348933e-03 3.213561e-03 -1.311227e-03 5.524959e-03 -4.294027e-03
[66] -1.020929e-03 4.933660e-03 -6.259534e-03 8.478015e-03 -4.697628e-03
[71] -9.228660e-04 -3.228616e-03 -1.562169e-03 -1.053683e-02 2.340644e-03
[76] 1.541931e-02 -5.793183e-04 1.844354e-04 4.302699e-03 -1.678733e-03
[81] 3.871081e-03 -7.441852e-03 2.197435e-04 -6.916006e-03 -4.521844e-03
> plot(returns)

Assignment 2:

Perform LOGIT analysis for 700 data points and then predict for 150 data points.

Solution :

z<-read.csv(file.choose(),header=T)

head(z)

z.data<-z[1:700,1:9]

sapply(z.data,mean)

z.data$ed<-factor(z.data$ed)

logit.est<-glm(default~age+employ+address+income+debtinc+creddebt+othdebt,data=z.data,family="binomial")

summary(logit.est)

confint.default(logit.est)

logit.eg2<-with(z[701:850,1:8],data.frame(age=age,employ=employ,address=address,income=income,debtinc=debtinc,creddebt=creddebt,othdebt=othdebt,ed=factor(1:3)))

logit.eg2$prob<-predict(logit.est,newdata=logit.eg2,type="response")

head(logit.eg2)

Tuesday, 22 January 2013

IT and Business Applications Lab - Day3 Assignments

Assignment 1: Grooves Impact Mileage: What is the relation between groove of tyre and mileage? Fit ‘lm’ and comment on the applicability on ‘lm’ in this case.

Answer :