Big and small daycares in Toronto by building type, mapped using RGoogleMaps and Toronto Open Data

Before my daughter was born, I thought that my wife and I would have to send her to a licensed child care centre somewhere in Toronto.  I had heard over and over how long of a waiting list I should expect the centre to have, and so we’d better get her registered nice and early!  Well, it turns out that we found an excellent unlicensed home day care which she’s been at for two years now.  So when I recently went on Toronto Open Data’s website and found a dataset of licensed child care centres throughout Toronto, I thought I might have a fun time analyzing a topic that I thankfully have not had to deal with thus far!

If you look in the dataset (or in the documentation for the dataset) you’ll see that it contains names, addresses, phone info, building type, number of spaces in the daycare (broken down into age categories, and then totalled up) and unprojected latitude/longitude coordinates.  This dataset literally begged me to map it, but it also begged me to use one of the number of spaces variables in a map as well!

The process which I used to create the maps is very similar to the maps I made when I was analyzing the Toronto Casino Feedback Form (Survey), except in the maps I’ve put in this post, the dots are bigger or smaller depending on the percentiles of a quantitative variable (in this case the total number of spaces in the child care centres of a particular building type).  You can find the R code I used to generate these maps and stats at the bottom of this post.

This is meant strictly as an exploratory exercise.  To provide further informative clarity for this exercise, I’ve created multiple maps where each map shows locations of child care centres from one building type (e.g. Places of Worship, Public Elementary Schools, High Rise Buildings, etc.).  As I describe each map, I’ll also refer to descriptive stats that I’ve calculated and displayed at the bottom of this post (after all of the R code).  If you look at the table at the bottom of this post, you’ll notice there were more building types than I’ve mapped here.  That’s because I didn’t feel like mapping everything, only some of the most popular ones 🙂

Public Elementary Schools are by far the most popular type of building in which the licensed child care centres in this dataset are found (279 centres, according to the data).  Looking at the map below, you can see that there is a very dense cluster of public elementary school child care centres in the core of the GTA (down-town Toronto, and North York).  As you go west towards Etobicoke and Rexdale, you definitely see fewer centres, and then in Scarborough you also see few centres, but they appear to be more dispersed and less clustered than in the other areas.  There’s a lot of variability in the number of spaces in these child care centres, ranging from a minimum of 15 to a maximum of 217, with the average number of spaces per public elementary school child care centres being around 74 spaces per centre.

Public Elementary Schools

Places of Worship, as you can see below, are far less numerous than their public elementary school counterparts, with only 116 registered in this dataset.  The first thing that I noticed with this map is that most of the small dots (thus the smaller child care centres in places of worship) seem to fall in the south of the GTA, rather than the north.  I suppose that makes sense to me in an analogical kind of a way.  In the north of the GTA (where I live) a lot of the businesses are big chains that seek to serve as many people as possible, whereas downtown there are a lot of smaller businesses that serve a niche market.  Perhaps it’s a similar story with child care centres in places of worship as well.

On a side note, the vast majority of the places of worship mentioned in this dataset were of Christian or Catholic denominations.  I was perhaps surprised not to find too many synagogues in there, but that might just be my bias speaking!

Child care centres in places of worship ranged from having a minimum of 8 spaces to a maximum of 167 with an average of about 48 spaces per centre.
places of worship

High Rise child care centres seem to show a pretty distinctive geographical pattern, as you can see below!  They seem to either be in the east of Toronto, near or beyond highway 404, or the west of Toronto, most of them beyond Allen Road/Dufferin Street.  I wonder what accounts for what looks like this hole in the middle!?  Also, you’ll notice that many of the smaller high rise child care centres are in the east of Toronto, rather than the West.  High rise child care centres range from having a minimum of 20 spaces to a maximum of 145, with an average of about 69 spaces per centre.  You’ll notice that the minimum number of spaces is higher than other categories, probably accounted for by the fact that they are likely serving many residents in their own high rise building!
High Rises

Purpose Buildings, or buildings that were created with the child care centre in mind, are fairly sparse throughout Toronto, with only 58 in the dataset.  In terms of clusters here, it almost looks like you could delineate 4 clusters of buildings: North, South, East, and West.  Purpose buildings range from having a minimum of 20 spaces to a maximum of 165 with an average of about 72 spaces per centre (however median is 60, suggesting that there are a few really big ones in there, relative to all the rest).  Outliers aside, it seems that purpose buildings are like high rise daycares, in that they are meant for higher capacity than other centres.
Purpose Buildings

Community and Recreation Centres with child care seem to almost show a circle pattern in how they are laid out around the GTA.  The obvious exceptions are in Scarborough, which seems to have very few community and recreation centres with child care compared to what’s going on in the west.  Perhaps mirroring the phenomenon we saw with child care centres in places of worship, a lot of the smaller community and recreation centres with child care are in the south, whereas the north is the domain of bigger centres.  These centres range from having a minimum of 13 spaces to a maximum of 146, with an average of about 64 spaces per centre.
Community and Recreation Centres

Although child care centres in Houses seem to show a very random looking pattern, I can’t help but notice that there are more than a few centres within a close proximity to the Go Train tracks emanating from Union Station.  Perhaps there’s an interesting story there, or maybe I’m just seeing patterns than don’t exactly mean anything (after all, there are just 38 of these places registered in the dataset!).  Child care centres in houses range from having a minimum of 10 spaces to a maximum of 116, with an average of about 50 spaces per centre.  I do have to wonder how exactly these houses fit so many kids.  Seeing as how we are living in a post google street view world, you can just look at whatever house you want in living colour by typing in the address.  You can see how big on of these houses is (it has 87 spaces!) below the following map.
Houses

Selection_030
Wow!  It’s not a full picture, but you get the sense that the house really is quite big!

Well, so concludes my foray into the world of licensed child care centres.  If you have any commentary to add regarding these results, or can show me a better way of mapping them (although I do like RGoogleMaps), then by all means leave me a comment!  R code is shared below.


library(ff)
library(ffbase)
library(RgoogleMaps)
library(plyr)
addTrans <- function(color,trans)
{
# This function adds transparancy to a color.
# Define transparancy with an integer between 0 and 255
# 0 being fully transparant and 255 being fully visable
# Works with either color and trans a vector of equal length,
# or one of the two of length 1.
if (length(color)!=length(trans)&!any(c(length(color),length(trans))==1)) stop("Vector lengths not correct")
if (length(color)==1 & length(trans)>1) color <- rep(color,length(trans))
if (length(trans)==1 & length(color)>1) trans <- rep(trans,length(color))
num2hex <- function(x)
{
hex <- unlist(strsplit("0123456789ABCDEF",split=""))
return(paste(hex[(x-x%%16)/16+1],hex[x%%16+1],sep=""))
}
rgb <- rbind(col2rgb(color),trans)
res <- paste("#",apply(apply(rgb,2,num2hex),2,paste,collapse=""),sep="")
return(res)
}
childcare = read.csv.ffdf(file="child-care.csv", first.rows=500,next.rows=500,colClasses=NA,header=TRUE)
pcodes = read.csv.ffdf(file="zipcodeset.txt", first.rows=50000, next.rows=50000, colClasses=NA, header=FALSE)
childcare$PCODE_R = as.ff(as.factor(sub(" ","", childcare[,"PCODE"])))
names(pcodes) = c("PCODE","Lat","Long","City","Prov")
childcare = merge(childcare, as.ffdf(pcodes[,1:3]), by.x="PCODE_R", by.y="PCODE", all.x=TRUE)
childcare.gc = subset(childcare, !is.na(Lat))
childcare.worship = subset(childcare.gc, bldg_type == "Place of Worship")
childcare.house = subset(childcare.gc, bldg_type == "House")
childcare.community = subset(childcare.gc, bldg_type == "Community/Recreation Centre")
childcare.pschool = subset(childcare.gc, bldg_type == "Public Elementary School")
childcare.highrise = subset(childcare.gc, bldg_type == "High Rise Apartment")
childcare.purpose = subset(childcare.gc, bldg_type == "Purpose Built")
Fn = ecdf(childcare.worship[,"TOTSPACE"])
childcare.worship$TOTSPACE.pct = as.ff(Fn(childcare.worship[,"TOTSPACE"]))
mymap = MapBackground(lat=childcare.worship[,"Lat"], lon=childcare.worship[,"Long"])
PlotOnStaticMap(mymap, childcare.worship[,"Lat"], childcare.worship[,"Long"], cex=childcare.worship[,"TOTSPACE.pct"]*4, pch=21, bg=addTrans("purple",100))
Fn = ecdf(childcare.house[,"TOTSPACE"])
childcare.house$TOTSPACE.pct = as.ff(Fn(childcare.house[,"TOTSPACE"]))
mymap = MapBackground(lat=childcare.house[,"Lat"], lon=childcare.house[,"Long"])
PlotOnStaticMap(mymap, childcare.house[,"Lat"], childcare.house[,"Long"], cex=childcare.house[,"TOTSPACE.pct"]*4, pch=21, bg=addTrans("purple",100))
Fn = ecdf(childcare.community[,"TOTSPACE"])
childcare.community$TOTSPACE.pct = as.ff(Fn(childcare.community[,"TOTSPACE"]))
mymap = MapBackground(lat=childcare.community[,"Lat"], lon=childcare.community[,"Long"])
PlotOnStaticMap(mymap, childcare.community[,"Lat"], childcare.community[,"Long"], cex=childcare.community[,"TOTSPACE.pct"]*4, pch=21, bg=addTrans("purple",100))
Fn = ecdf(childcare.pschool[,"TOTSPACE"])
childcare.pschool$TOTSPACE.pct = as.ff(Fn(childcare.pschool[,"TOTSPACE"]))
mymap = MapBackground(lat=childcare.pschool[,"Lat"], lon=childcare.pschool[,"Long"])
PlotOnStaticMap(mymap, childcare.pschool[,"Lat"], childcare.pschool[,"Long"], cex=childcare.pschool[,"TOTSPACE.pct"]*4, pch=21, bg=addTrans("purple",100))
Fn = ecdf(childcare.highrise[,"TOTSPACE"])
childcare.highrise$TOTSPACE.pct = as.ff(Fn(childcare.highrise[,"TOTSPACE"]))
mymap = MapBackground(lat=childcare.highrise[,"Lat"], lon=childcare.highrise[,"Long"])
PlotOnStaticMap(mymap, childcare.highrise[,"Lat"], childcare.highrise[,"Long"], cex=childcare.highrise[,"TOTSPACE.pct"]*4, pch=21, bg=addTrans("purple",100))
Fn = ecdf(childcare.purpose[,"TOTSPACE"])
childcare.purpose$TOTSPACE.pct = as.ff(Fn(childcare.purpose[,"TOTSPACE"]))
mymap = MapBackground(lat=childcare.purpose[,"Lat"], lon=childcare.purpose[,"Long"])
PlotOnStaticMap(mymap, childcare.purpose[,"Lat"], childcare.purpose[,"Long"], cex=childcare.purpose[,"TOTSPACE.pct"]*4, pch=21, bg=addTrans("purple",100))
space.by.bldg_type = ddply(as.data.frame(childcare.gc), .(bldg_type), function (x) c(min.space = min(x[,"TOTSPACE"], na.rm=TRUE), average.space = mean(x[,"TOTSPACE"], na.rm=TRUE), median.space = median(x[,"TOTSPACE"], na.rm=TRUE), max.space = max(x[,"TOTSPACE"], na.rm=TRUE), tot_daycares = sum(!is.na(x[,"TOTSPACE"]))))
space.by.bldg_type = space.by.bldg_type[order(-space.by.bldg_type$tot_daycares),]
bldg_type min.space average.space median.space max.space tot_daycares
18 Public Elementary School 15 74.19355 69.0 217 279
17 Place of Worship 8 48.46552 44.0 167 116
16 Other 14 51.17647 48.5 160 102
1 Catholic Elementary School 16 51.50000 49.5 112 76
9 High Rise Apartment 20 68.56522 62.0 145 69
22 Purpose Built 20 72.48276 59.5 165 58
8 Community/Recreation Centre 13 63.73333 60.0 146 45
11 House 10 49.84211 44.5 116 38
6 Commercial Building 16 55.95833 51.5 129 24
15 Office Building 20 69.69565 64.0 162 23
20 Public High School 16 42.36842 41.0 60 19
21 Public School (Closed) 22 70.26667 56.0 180 15
4 Church 13 51.90909 46.0 148 11
19 Public Elementary School (French) 36 84.71429 70.0 167 7
23 Synagogue 24 64.00000 61.0 108 7
7 Community College/University 15 55.16667 59.5 78 6
14 Low Rise Apartment 15 56.00000 62.0 92 6
2 Catholic Elementary School(French) 39 81.20000 76.0 130 5
5 City owned Community/Recreation Centre 28 65.80000 62.0 103 5
3 Catholic High School 36 51.50000 54.0 62 4
12 HUMSRV 45 52.00000 52.0 59 2
13 Industrial Building 45 109.00000 109.0 173 2
26 Private Elementary School 20 154.50000 154.5 289 2
10 Hospital/Health Centre 25 25.00000 25.0 25 1
24 109 109.00000 109.0 109 1
25 Coomunity/Recreation Centre 156 156.00000 156.0 156 1
27 Public Middle School 10 10.00000 10.0 10 1

view raw

daycares.R

hosted with ❤ by GitHub

Do Torontonians Want a New Casino? Survey Analysis Part 1

Toronto City Council is in the midst of a very lengthy process of considering whether or not to allow the OLG to build of a new casino in Toronto, and where.  The process started in November of 2012, and set out to answer this question through many and varied consultations with the public, and key stakeholders in the city.

One of the methods of public consultation that they used was a “Casino Feedback Form“, or survey that was distributed online and in person.  By the time the deadline had passed to collect responses on this survey (January 25, 11:59pm), they had collected a whopping 17,780 responses.  The agency seemingly responsible for the survey is called DPRA, and from what I can tell they seemed to do a pretty decent job of creating and distributing the survey.

In a very surprisingly modern and democratic form, Toronto City Council made the response data for the survey available on the Toronto Open Data website, which I couldn’t help but download and analyze for myself (with R of course!).

For a relatively small survey, it’s very rich in information.  I love having hobby data sets to work with from time to time, and so I’m going to dedicate a few posts to the analysis of this response data file.  This post will not show too much that’s different from the report that DPRA has already released, as it contains largely univariate analyses.  In later posts however, I will get around to asking and answering those questions that are of a more multivariate nature!  To preserve flow of the post, I will post the R code at the end, instead of interspersing it throughout like I normally do.  Unless otherwise specified, all numerical axes represent the % of people who selected a particular response on the survey.

Without further ado, I will start with some key findings:

Key Findings

  1. With 17,780 responses, Toronto City Council obtained for themselves a hefty data set with pretty decent geographical coverage of the core areas of Toronto (Toronto, North York, Scarborough, Etobicoke, East York, Mississauga).  This is much better than Ipsos Reid’s Casino Survey response data set of 906 respondents.
  2. Only 25.7% of respondents were somewhat or strongly in favour of having a new casino in Toronto.  I’d say that’s overwhelmingly negative!
  3. Ratings of the suitability of a casino in three different locations by type of casino indicate that people are more favourable towards an Integrated Entertainment Complex (basically a casino with extra amenities) vs. a standalone casino.
  4. Of the three different locations, people were most favourable towards an Integrated Entertainment Complex at the Exhibition Place.  However, bear in mind that only 27.4% of respondents thought it was suitable at all.  This is a ‘best of the worst’ result!
  5. When asked to rate the importance of a list of issues surrounding the building of a new casino in Toronto, respondents rated as most important the following issues: safety, health, addiction, public space, traffic, and integration with surrounding areas.

Geographic Distribution of Responses

In a relatively short time, City Council managed to collect many responses to their survey.  I wanted to look at the geographic distribution of all of these responses.  Luckily, the survey included a question that asked for the first 3 characters of the respondents’ postal code (or FSA).  If you have a file containing geocoded postal codes, you then can plot the respondents on a map.  I managed to find such a file on a website called geocoder.ca, with latitude and longitude coordinates for over 90,000 postal codes).  Once I got the file into R, I made sure that all FSA codes in the survey data were capitalized, created an FSA column in the geocoded file, and then merged the geocoded dataset into the survey dataset.  This isn’t a completely valid approach, but when looking at a broad area, I don’t think the errors in plotting points on a map aren’t going to look that serious.

For a survey about Toronto, the geographic distribution was actually pretty wide.  Have a look at the complete distribution:

Total Geo Distribution of Responses

Obviously there seem to be a whole lot of responses in Southern Ontario, but we even see a smattering of responses in neighbouring provinces as well.  However, let’s look at a way of zooming in on the large cluster of Southern Ontario cities.  From the postal codes, I was able to get the city in which each response was made.  From that I pulled out what looked like a good cluster of top southern ontario cities:

          City	 # Responses
       Toronto	8389
    North York	1553
   Scarborough	1145
     Etobicoke	936
     East York	462
   Mississauga	201
       Markham	149
      Brampton	111
 Richmond Hill	79
     Thornhill	62
          York	59
         Maple	58
        Milton	30
      Oakville	30
    Woodbridge	30
    Burlington	28
        Oshawa	25
     Pickering	22
        Whitby	19
      Hamilton	17
        Bolton	14
        Guelph	13
      Nobleton	12
        Aurora	11
          Ajax	10
       Caledon	10
   Stouffville	10
        Barrie	9

Lots of people in Toronto, obviously, a fair amount in North York, Scarborough, and Etobicoke, and then it leaps downwards in frequency from there. However, these city labels are from the geocoding, and who knows if some people it places in Toronto are actually from North York (the tree, then one of its apples). So, I filtered the latitude and longitude coordinates based on this top list to get the following zoomed-in map:

Toronto Geo Distribution of ResponsesMuch better than a table!  I used transparency on the colours of the circles to help better distinguish dense clusters of responses from sparse ones.  Based on the map, I can see 3 patterns:

1) It looks like there is a huge cluster of responses came from an area of Toronto approximately bounded by Dufferin on the West, highway 404 on the east, the Gardiner on the south, and the 401 on the north.

2) There’s also an interesting vertical cluster that seems to go from well south of highway 400 and the 401, and travels north to the 407.

3) I’m not sure I would call this a cluster per se, but there definitely seems to be a pattern where you find responses all the way along the Gardiner Expressway/Queen Elizabeth Way/Kingston Road/401 from Burlington to Oshawa.

Now for the survey results!

Demographics of Respondents

This slideshow requires JavaScript.

As you can see, about 80% of the respondents disclosed their gender, with a noticeable bias towards men.  Also, most of the respondents who disclosed their age were between 25 and 64 years of age.  This might be a disadvantage, according to a recent report by Statistics Canada on gambling.  If you look at page 6 on the report, you will see that of all age groups of female gamblers, those 65 and older are spending the most amount of money on Casinos, Slot Machines, and VLTs per 1 person spending household.  However, I guess it’s better having some information than no information.

Feelings about the new casino

Feelings about the new casino

Well, isn’t that something?  Only about a quarter of all people surveyed actually have positive feelings about a new casino!  I have to say this is pretty telling.  You would think this would be damning information, but here’s where we fall into the trap of whether or not to trust a survey result.

Here we have this telling response, but then again, Ipsos Reid conducted a poll that gathered 906 responses that concluded that 52% of Torontonians “either strongly or somewhat support a new gambling venue within its borders”.  People were asked about their support of a new casino at the beginning and ending.  At the end, after they supplied people with all the various arguments supplied by both sides of the debate, they asked the question again.  Apparently the proportion supporting the casino was 54% when analyzing the second instance of the question.  They don’t even link to the original question form, so I’m left to wonder exactly how it was phrased, and what preceded it.  The only hint is in this phrase: ” if a vote were held tomorrow on the idea of building a casino in the city of Toronto…”.  Does that seem comparable to you?

A Casino for “Toronto The Good”?

Casino fit image of toronto

This question seems to be pretty similar to the first question.  If a new casino fits your image of Toronto perfectly, then you’re probably going to be strongly in favour of one!  Obviously, most people seem pretty sure that a new casino just isn’t the kind of thing that would fit in with their image of “Toronto the Good”.

Where to build a new casino

Where casino builtIn the response pattern here, we seem to see a kind of ‘not in/near my backyard’ mentality going on.  A slight majority of respondents seem to be saying that if a new casino is to be built, it should be somewhere decently far away from Toronto, perhaps so that they don’t have to deal with the consequences.  I’ll eat my sock if the “Neither” folks aren’t those who also were strongly opposed to the casino.

Casino Suitability in Downtown Area

Casino suitability at Exhibition Place

Casino Suitability at Port Lands

They also asked respondents to rate the suitability of a new casino in three different locations:

  1. A downtown area (bounded by Spadina Avenue, King Street, Jarvis Street and Queens Quay)
  2. Exhibition Place (bounded by Gardiner Expressway, Lake Shore Boulevard, Dufferin Street and Strachan Avenue)
  3. Port Lands (located south of the Don Valley and Gardiner/Lake Shore, east of the downtown core)

Looking at the above 3 graphs, you see right away that a kind of casino called an Integrated Entertainment Complex (kind of a smorgasboard of casino, restaurant, theatre, hotel, etc.) is more favourable than a standalone casino at any location.  That being said, the responses are still largely negative!  Out of the 3 options for location of an Integrated Entertainment Complex (IEC), it was Exhibition Place that rated the most positive by a small margin (18.1% said highly suitable, vs. 16.2% for downtown Toronto).  There are definitely those at the Exhibition Place who want the Toronto landmark to be chosen!

Desired Features of IEC by Location

This slideshow requires JavaScript.

These charts indicate that, for those who can imagine an Integrated Entertainment Complex in either of the 3 locations, they would like features at that locations that allow them to sit/stand and enjoy themselves.  Restaurants, Cultural and Arts Facilities, and Theatre are tops in all 3 locations (but still bear in mind that less than half opted for those choices).  A quick google search reveals that restaurants and theatres are mentioned in a large number of search results.  An article in the Toronto Sun boasts that an Integrated Entertainment Complex would catapult Toronto into the stars as even more of a tourist destination.  Interestingly, the article also mentions the high monetary value of convention visitors and how much that would add to the revenues generated for the city.  I find it funny that the popularity of having convention centre space in this survey is at its highest when people are asked about the Exhibition Place.  The Exhibition Place already has convention centre space!!  I don’t understand the logic, but maybe someone will explain it to me.

Issues of Importance Surrounding a New Casino

Issues of Importance re the New CasinoUnlike the previous graphs, this one charts the % who gave a particular response on each item.  In this case, the graph shows the % of respondents who gave gave the answer “Very Important” when asked to rate each issue surrounding the new casino.  Unlike some of the previous questions, this one did not include a “No Casino” option, so already more people can contribute positively to the response distribution.  You can already see that people are pretty riled up about some serious social and environmental issues.  They’re worried about safety, health, addiction, public space (sounds like a worry about clutter to me), traffic, and integration with surrounding areas.  I’ll bet that the people worried about these top 5 issues are the people most likely to say that they don’t want a casino anywhere.  It will be interesting to uncover some factor structure here and then find out what the pro and anti casino folks are concerned with.

For my next post, I have in mind to investigate a few simple questions so far:

  1. Who exactly wants or doesn’t want a new casino, and where?  What are they most concerned with (those who do and don’t want a casino)
  2. Is there a “not in my backyard” effect going on, where those who are closest to the proposed casino spots are the least likely to want it there, but more likely to want a casino elsewhere?  I have latitude/longitude coordinates, and can convert them into distances from the proposed casino spots.  I think that will be interesting to look at!


library(ff)
library(ffbase)
library(stringr)
library(ggplot2)
library(ggthemes)
library(reshape2)
library(RgoogleMaps)
# Loading 2 copies of the same data set so that I can convert one and have the original for its text values
casino = read.csv("/home/inkhorn/Downloads/casino_survey_results20130325.csv")
casino.orig = read.csv("/home/inkhorn/Downloads/casino_survey_results20130325.csv")
# Here's the dataset of canadian postal codes and latitude/longitude coordinates
pcodes = read.csv.ffdf(file="/home/inkhorn/Downloads/zipcodeset.txt", first.rows=50000, next.rows=50000, colClasses=NA, header=FALSE)
# I'm doing some numerical recoding here. If you can tell me a cleaner way of doing this
# then by all means please do. I found this process really annoyingly tedious.
casino$Q1_A = ifelse(casino.orig$Q1_A == "Neutral or Mixed Feelings", 3,
ifelse(casino.orig$Q1_A == "Somewhat in Favour", 4,
ifelse(casino.orig$Q1_A == "Somewhat Opposed", 2,
ifelse(casino.orig$Q1_A == "Strongly in Favour", 5,
ifelse(casino.orig$Q1_A == "Strongly Opposed", 1,NA)))))
casino$Q2_A = ifelse(casino.orig$Q2_A == "Does Not Fit My Image At All", 1,
ifelse(casino.orig$Q2_A == "Neutral / I am Not Sure",2,
ifelse(casino.orig$Q2_A == "Fits Image Somewhat", 3,
ifelse(casino.orig$Q2_A == "Fits Image Perfectly", 4, NA))))
for (i in 8:24) {
casino[,i] = ifelse(casino.orig[,i] == "Not Important At All", 1,
ifelse(casino.orig[,i] == "Somewhat Important", 2,
ifelse(casino.orig[,i] == "Very Important", 3,NA)))}
for (i in c(31:32,47,48,63,64)) {
casino[,i] = ifelse(casino.orig[,i] == "Highly Suitable",5,
ifelse(casino.orig[,i] == "Neutral or Mixed Feelings",3,
ifelse(casino.orig[,i] == "Somewhat Suitable",4,
ifelse(casino.orig[,i] == "Somewhat Unsuitable",2,
ifelse(casino.orig[,i] == "Strongly Unsuitable",1,NA)))))}
# There tended to be blank responses in the original dataset. When seeking to
# plot the responses in their original text option format, I got rid of them in some cases,
# or coded them in "Did not disclose" in others.
casino.orig$Q1_A[casino.orig$Q1_A == ""] = NA
casino.orig$Q1_A = factor(casino.orig$Q1_A, levels=c("Strongly Opposed","Somewhat Opposed","Neutral or Mixed Feelings","Somewhat in Favour","Strongly in Favour"))
# Here's the graph showing how people feel about a new casino
ggplot(subset(casino.orig, !is.na(Q1_A)), aes(x=Q1_A,y=..count../sum(..count..))) + geom_bar(fill="forest green") + coord_flip() + ggtitle("How do you feel about having a new casino in Toronto?") + scale_x_discrete(name="") + theme_wsj() + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)), geom="text") + scale_y_continuous(labels=percent)
# How does the casino fit into your image of toronto…
ggplot(subset(casino.orig, Q2_A!= ''), aes(x=Q2_A,y=..count../sum(..count..))) + geom_bar(fill="forest green") + coord_flip() + ggtitle("How does a new casino in Toronto fit your image of the City of Toronto?") + scale_x_discrete(name="") + theme_wsj() + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)),geom="text") + scale_y_continuous(labels=percent)
# Where you'd prefer to see it located
ggplot(subset(casino.orig, Q6!= ''), aes(x=Q6,y=..count../sum(..count..))) + geom_bar(fill="forest green") + coord_flip() + ggtitle("If a casino is built, where would you prefer to see it located?") + scale_x_discrete(name="") + theme_wsj() + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)), geom="text") + scale_y_continuous(labels=percent)
# Here I reorder the text labels from the questions asking about suitability of the downtown location
casino.orig$Q7_A_StandAlone = reorder(casino.orig$Q7_A_StandAlone, casino$Q7_A_StandAlone)
casino.orig$Q7_A_Integrated = reorder(casino.orig$Q7_A_Integrated, casino$Q7_A_Integrated)
# Reshaping the downtown ratings data for graphing..
stand.and.integrated.ratings.downtown = cbind(prop.table(as.matrix(table(casino.orig$Q7_A_StandAlone)[1:5])),
prop.table(as.matrix(table(casino.orig$Q7_A_Integrated)[1:5])))
colnames(stand.and.integrated.ratings.downtown) = c("Standalone Casino","Integrated Entertainment Complex")
stand.and.integrated.ratings.downtown.long = melt(stand.and.integrated.ratings.downtown, varnames=c("Rating","Casino Type"), value.name="Percentage")
# Graphing ratings of casino suitability for the downtown location
ggplot(stand.and.integrated.ratings.downtown.long, aes(x=stand.and.integrated.ratings.downtown.long$"Casino Type", fill=Rating, y=Percentage,label=sprintf("%.02f %%", Percentage*100))) + geom_bar(position="dodge") + coord_flip() + ggtitle("Ratings of Casino Suitability \nin Downtown Toronto by Casino Type") + scale_x_discrete(name="") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent) + geom_text(aes(x=stand.and.integrated.ratings.downtown.long$"Casino Type", y=Percentage, ymax=Percentage, label=sprintf("%.01f%%",Percentage*100), hjust=.75),position = position_dodge(width=1),size=4) + scale_fill_few(palette="light") + theme_wsj()
# Reshaping the exhibition place ratings for graphing
stand.and.integrated.ratings.exhibition = cbind(prop.table(as.matrix(table(casino.orig$Q7_B_StandAlone)[2:6])),
prop.table(as.matrix(table(casino.orig$Q7_B_Integrated)[2:6])))
colnames(stand.and.integrated.ratings.exhibition) = c("Standalone Casino","Integrated Entertainment Complex")
stand.and.integrated.ratings.exhibition.long = melt(stand.and.integrated.ratings.exhibition, varnames=c("Rating","Casino Type"), value.name="Percentage")
# Reordering the rating text labels for the graphing.
stand.and.integrated.ratings.exhibition.long$Rating = factor(stand.and.integrated.ratings.exhibition.long$Rating, levels=levels(casino.orig$Q7_A_StandAlone)[1:5])
# Graphing ratings of casino suitability for the exhibition place location
ggplot(stand.and.integrated.ratings.exhibition.long, aes(x=stand.and.integrated.ratings.exhibition.long$"Casino Type", fill=Rating, y=Percentage,label=sprintf("%.02f %%", Percentage*100))) + geom_bar(position="dodge") + coord_flip() + ggtitle("Ratings of Casino Suitability \nat Exhibition Place by Casino Type") + scale_x_discrete(name="") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent) + geom_text(aes(x=stand.and.integrated.ratings.exhibition.long$"Casino Type", y=Percentage, ymax=Percentage, label=sprintf("%.01f%%",Percentage*100), hjust=.75), position = position_dodge(width=1),size=4) + scale_fill_few(palette="light") + theme_wsj()
# Reshaping the Port Lands ratings for graphing
stand.and.integrated.ratings.portlands = cbind(prop.table(as.matrix(table(casino.orig$Q7_C_StandAlone)[2:6])),
prop.table(as.matrix(table(casino.orig$Q7_C_Integrated)[2:6])))
colnames(stand.and.integrated.ratings.portlands) = c("Standalone Casino", "Integrated Entertainment Complex")
stand.and.integrated.ratings.portlands.long = melt(stand.and.integrated.ratings.portlands, varnames=c("Rating","Casino Type"), value.name="Percentage")
# Reording the rating text labels for the graping.
stand.and.integrated.ratings.portlands.long$Rating = factor(stand.and.integrated.ratings.portlands.long$Rating, levels=levels(casino.orig$Q7_A_StandAlone)[1:5])
# Graphing ratings of casino suitability for the port lands location
ggplot(stand.and.integrated.ratings.portlands.long, aes(x=stand.and.integrated.ratings.portlands.long$"Casino Type", fill=Rating, y=Percentage,label=sprintf("%.02f %%", Percentage*100))) + geom_bar(position="dodge") + coord_flip() + ggtitle("Ratings of Casino Suitability \nat Port Lands by Casino Type") + scale_x_discrete(name="") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent) + geom_text(aes(x=stand.and.integrated.ratings.portlands.long$"Casino Type", y=Percentage, ymax=Percentage, label=sprintf("%.01f%%",Percentage*100), hjust=.75), position = position_dodge(width=1),size=4) + scale_fill_few(palette="light") + theme_wsj()
# This was the part in my analysis where I looked at postal codes (FSAs really) and their coordinates
# Sorry I'm not more linear in how I do my analysis vs. write about it 🙂
# You'll notice that I've imported the geocode file as ffdf. This led to faster merging with the
# original casino data set. This meant that I had to coerce the casino.orig data frame into ffdf format
# But I work with it every day at work, so I'm used to it by now, despite its idiosynchracies.
casino.orig$PostalCode = toupper(casino.orig$PostalCode)
pcodes = read.csv.ffdf(file="/home/inkhorn/Downloads/zipcodeset.txt", first.rows=50000, next.rows=50000, colClasses=NA, header=FALSE)
names(pcodes) = c("Postal","Lat","Long","City","Prov")
pcodes$FSA = as.ff(as.factor(toupper(substr(pcodes[,"Postal"], 1,3))))
casino.orig = as.ffdf(casino.orig)
casino.orig$PostalCode = as.ff(as.factor(toupper(casino.orig[,"PostalCode"])))
casino.orig = merge(casino.orig, pcodes, by.x="PostalCode", by.y="FSA", all.x=TRUE)
# This is the code for the full map I generated
casino.gc = casino.orig[which(!is.na(casino.orig[,"Lat"])),] # making sure only records with coordinates are included…
mymap = MapBackground(lat=casino.gc$Lat, lon=casino.gc$Long)
PlotOnStaticMap(mymap, casino.gc$Lat, casino.gc$Long, cex=1.5, pch=21, bg="orange")
# Here I'm getting a list of cities, winnowing it down, and using it to filter the
# geocode coordinates to zoom in on the map I generated.
cities = data.frame(table(casino.orig[,"City"]))
cities = cities[cities$Freq > 0,]
cities = cities[order(cities$Freq, decreasing=TRUE),]
cities = cities[cities$Var1 != '',]
cities.filter = cities[1:28,] # Here's my top cities variable (i set an arbitrary dividing line…)
names(cities.filter) = c("City","# Responses")
# Here's where I filtered the original casino ffdf so that it only contained the cities
# that I wanted to see in Southern Ontario
casino.top.so = casino.orig[which(casino.orig[,"City"] %in% cities.filter$Var1),]
# here's a transparency function that I used for the southern ontario map
addTrans <- function(color,trans)
{
# This function adds transparancy to a color.
# Define transparancy with an integer between 0 and 255
# 0 being fully transparant and 255 being fully visable
# Works with either color and trans a vector of equal length,
# or one of the two of length 1.
if (length(color)!=length(trans)&!any(c(length(color),length(trans))==1)) stop("Vector lengths not correct")
if (length(color)==1 & length(trans)>1) color <- rep(color,length(trans))
if (length(trans)==1 & length(color)>1) trans <- rep(trans,length(color))
num2hex <- function(x)
{
hex <- unlist(strsplit("0123456789ABCDEF",split=""))
return(paste(hex[(x-x%%16)/16+1],hex[x%%16+1],sep=""))
}
rgb <- rbind(col2rgb(color),trans)
res <- paste("#",apply(apply(rgb,2,num2hex),2,paste,collapse=""),sep="")
return(res)
}
# Finally here's the southern ontario map code
mymap = MapBackground(lat=casino.top.so$Lat, lon=casino.top.so$Long)
PlotOnStaticMap(mymap, casino.top.so$Lat, casino.top.so$Long, cex=1.5, pch=21, bg=addTrans("orange",10))
# Here's some code for summarizing and plotting the response data to the question
# around issues of importance regarding the new casino (question 3)
q3.summary = matrix(NA, 16,1,dimnames=list(c("Design of the facility",
"Employment opportunities","Entertainment and cultural activities",
"Expanded convention facilities", "Integration with surrounding areas",
"New hotel accommodations","Problem gambling & health concerns",
"Public safety and social concerns","Public space",
"Restaurants","Retail","Revenue for the City","Support for local businesses",
"Tourist attraction","Traffic concerns","Training and career development"),c("% Very Important")))
for (i in 8:23) {
q3.summary[i-7] = mean(casino[,i] == 3, na.rm=TRUE)}
q3.summary = as.data.frame(q3.summary[order(q3.summary[,1], decreasing = FALSE),])
names(q3.summary)[1] = "% Very Important"
q3.summary$Concern = rownames(q3.summary)
q3.summary = q3.summary[order(q3.summary$"% Very Important", decreasing=FALSE),]
q3.summary$Concern = factor(q3.summary$Concern, levels=q3.summary$Concern)
ggplot(q3.summary, aes(x=Concern, y=q3.summary$"% Very Important")) + geom_point(size=5, colour="forest green") + coord_flip() + ggtitle("Issues of Importance Surrounding\nthe New Casino") + scale_x_discrete(name="Issues of Importance") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent) + theme_wsj()
# This chunk of code deals with summarizing and plotting the questions surrounding
# what features people might want if a new Integrated Entertainment Complex is built
q7a.summary = matrix(NA, 9,1, dimnames=list(c("No Casino","Casino Only", "Convention Centre Space", "Cultural and Arts Facilities",
"Hotel","Nightclubs","Restaurants","Retail","Theatre"),c("% Include")))
for (i in 36:44) {
q7a.summary[i-35] = mean(casino[,i], na.rm=TRUE)}
q7a.summary = as.data.frame(q7a.summary[order(q7a.summary[,1], decreasing = FALSE),])
names(q7a.summary)[1] = "% Include"
q7a.summary$feature = rownames(q7a.summary)
q7a.summary$feature = factor(q7a.summary$feature, levels=q7a.summary$feature)
ggplot(q7a.summary, aes(x=feature, y=q7a.summary$"% Include")) + geom_point(size=5, colour="forest green") + coord_flip() + ggtitle("What People Would Want in an Integrated\nEntertainment Complex in Downtown Toronto") + scale_x_discrete(name="Features") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent,name="% Wanting the Feature") + theme_wsj()
q7b.summary = matrix(NA, 9,1, dimnames=list(c("No Casino","Casino Only", "Convention Centre Space", "Cultural and Arts Facilities",
"Hotel","Nightclubs","Restaurants","Retail","Theatre"),c("% Include")))
for (i in 52:60) {
q7b.summary[i-51] = mean(casino[,i], na.rm=TRUE)}
q7b.summary = as.data.frame(q7b.summary[order(q7b.summary[,1], decreasing = FALSE),])
names(q7b.summary)[1] = "% Include"
q7b.summary$feature = rownames(q7b.summary)
q7b.summary$feature = factor(q7b.summary$feature, levels=q7b.summary$feature)
ggplot(q7b.summary, aes(x=feature, y=q7b.summary$"% Include")) + geom_point(size=5, colour="forest green") + coord_flip() + ggtitle("What People Would Want in an Integrated\nEntertainment Complex at the Exhbition Place") + scale_x_discrete(name="Features") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent,name="% Wanting the Feature") + theme_wsj()
q7c.summary = matrix(NA, 9,1, dimnames=list(c("No Casino","Casino Only", "Convention Centre Space", "Cultural and Arts Facilities",
"Hotel","Nightclubs","Restaurants","Retail","Theatre"),c("% Include")))
for (i in 68:76) {
q7c.summary[i-67] = mean(casino[,i], na.rm=TRUE)}
q7c.summary = as.data.frame(q7c.summary[order(q7c.summary[,1], decreasing = FALSE),])
names(q7c.summary)[1] = "% Include"
q7c.summary$feature = rownames(q7c.summary)
q7c.summary$feature = factor(q7c.summary$feature, levels=q7c.summary$feature)
ggplot(q7c.summary, aes(x=feature, y=q7b.summary$"% Include")) + geom_point(size=5, colour="forest green") + coord_flip() + ggtitle("What People Would Want in an Integrated\nEntertainment Complex in Port Lands") + scale_x_discrete(name="Features") + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + scale_y_continuous(labels=percent,name="% Wanting the Feature") + theme_wsj()
# It sucks, but I imported yet another version of the casino dataset so that I wouldn't have to use
# the annoying ffdf indexing notation (e.g. df[,"variable1"])
casino.orig2 = read.csv("/home/inkhorn/Downloads/casino_survey_results20130325.csv")
# Finally, here's some code where I processed and plotted the Gender and Age demographic variables
casino$Gender = casino.orig$Gender
casino$Gender = ifelse(!(casino.orig2$Gender %in% c("Female","Male","Transgendered")), "Did not disclose",
ifelse(casino.orig2$Gender == "Female","Female",
ifelse(casino.orig2$Gender == "Male","Male","Transgendered")))
casino$Gender = factor(casino$Gender, levels=c("Transgendered","Did not disclose","Female","Male"))
ggplot(casino, aes(x=Gender,y=..count../sum(..count..))) + geom_bar(fill="forest green") + coord_flip() + ggtitle("Gender Distribution of Respondents") + scale_x_discrete(name="") + theme_wsj() + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)),
geom="text") + scale_y_continuous(labels=percent)
casino$Age = ifelse(casino.orig2$Age == "", "Did not disclose",
ifelse(casino.orig2$Age == "Under 15", "Under 15",
ifelse(casino.orig2$Age == "15-24", "15-24",
ifelse(casino.orig2$Age == "25-34", "25-34",
ifelse(casino.orig2$Age == "35-44", "35-44",
ifelse(casino.orig2$Age == "45-54","45-54",
ifelse(casino.orig2$Age == "55-64","55-64",
ifelse(casino.orig2$Age == "65 or older", "65 or older","Did not disclose"))))))))
casino$Age = factor(casino$Age, levels=c("Did not disclose","Under 15","15-24","25-34","35-44","45-54","55-64","65 or older"))
ggplot(casino, aes(x=Age,y=..count../sum(..count..))) + geom_bar(fill="forest green") + coord_flip() + ggtitle("Age Distribution of Respondents") + scale_x_discrete(name="") + theme_wsj() + theme(title=element_text(size=22),plot.title=element_text(hjust=.8)) + stat_bin(aes(label = sprintf("%.02f %%", ..count../sum(..count..)*100)), geom="text") + scale_y_continuous(labels=percent)