A project looking at the relationship between 311 calls in the New York City, the weather, and market performance.
The purpose of the project is to see if 311 call volume is correlated with other observable factors. If call volume and complaints can be accurately predicted this would have several different potential applications. The most immediate application is for the City itself. If the City knows to expect a rush of calls or even better, a specific type of call, they can better staff their call center and prepare response teams. Additionally, private companies might also be able to improve their targeted marketing. For example, I found that rodent complaints spike at regular intervals. By finding what might predict these spikes, companies that provide extermination services, or companies that offer products to address rodent problems, can increase their marketing efforts precisely when demand is higher, optimizing their marketing budgets. It might also be a time when pet adoption agencies should be increasing their efforts as more people might be interested in adopting cats out of shelters to deal with mice and rats.
I gathered data from three sources. First are all 311 calls in New York City in 2013 available for download at NYC OpenData. This is a 124 MB .csv file. From looking through the 311 data, I noticed that the most common call type was regarding heating. Thus I thought it important to collect weather data. I downloaded daily weather reports for New York City from 2013 via NOAA's National Centers for Environmental Information which provided a .54MB .csv file. Finally, I thought that economic performance might influence how people feel about their current situation. If the economy is doing well, perhaps they will be slightly less cranky or as easily annoyed and therefore be less likely to call 311. To capture the economic conditions, I make use of daily information regarding the performance of the Dow Jones Industrial Average (DJIA). To get this data, I use Quandl's native R package and use the data described here.
Once I had all of my data, I subset my data to only Manhattan based 311 calls. I then created daily counts of 311 calls to use as my dependent variable. I then created a simple regression model using the change in the day's DJIA, the day's DJIA volume, the minimum observed temperature, and the amount of precipitation. While I found no effect for change in DJIA or DJIA volume, I did find that lower temperatures significantly increased the number of 311 calls. Serial auto-correlation is of course an issue in these time series data, and I checked the robustness of my results appropriately. While I found no effect for the economic indicator I used, I believe that project is still worth pursuing for the following reasons. First, I did find a significant predictor of 311 call volume. Second, I did not disaggregate complaint types in my analysis; heating complaints are the single most common type of 311 complaint and it's likely that my model results are being driven by this result. More nuanced investigation might reveal that different classes of complaints are predicted by different identifiable events and circumstances.