Crime_Prediction

R scripts for predict future crime incidents in Chicago city area, by applying Logistic regression, Logistic regression and Random Forest, from Twitter-derived information (originally saved in PostgresSQL database), weather data and standard crime prediction approach based on kernel density estimation (KDE)

Utility Functions

CrimePredUtil.R is a script for all utility functions. The discription of parameters and outputs are inlcuded in the script.

Create Senitment Lexicon Dictionary

Go to http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar to download a list of positive and negative opinion words or sentiment words for English (around 6800 words). The list was compiled by Hu and Liu Minqing Hu and Bing Liu. Use SentimentLexicon.R is a script for adding your own positive/negative words and creating a sentiment dataframe:

POLKEY

for furture use in polarity function.

Connnect to PostgresSQL Database

Make sure to have PostgresSQL database configured. If not, download/configured postgresSQL database management system according to http://www.postgresql.org/download/. Then, import twitter data in to a database you created. TwitterSentiment.R has exampel code for connecting PostgreSQL (you will need to modified the code):

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host = 'localhost', port='5432', 
                 dbname = 'postgis_tweet',user = 'postgres', 
                 password='secret')

Clean up twitter data and calculate sentiment score

In TwitterLexicon.R script, use tweet.qry to draw query for twitter posts within chicago area. You can set different time period. Then you can clean up twitter posts to replace emoticans, remove punctuation, etc. using tweet.clean function. Next, aggregate tweets within the same time period (in this case, 6 hours) and the same neighborhood together as a single document. Calaculate the snetiment score for each document using polarity function. Furthermore, the trend index for sentiment score is derived by trend_ind function. The script will retune a datafrme:

Jan.2014.pol.trend.6hour

which contains the time period, grid_id(neighborhood) and sentiment polarity and its trend.

Get crime density using KDE

Before calsulating crime density, make sure you have historical crime data from Chicago data portal, https://data.cityofchicago.org/. In this case, we use theft incidents data happened in 2013-2014, which contains timestamp and longitude/latitude. CrimeDensity.R is a script for calculating historical crime density in each neighborhood in chicago area.
The script will retune a datafrme:

theft.training.density

which contains the time period, grid_id(neighborhood) and density.

Clean up weather data

Get weather data in Chicago from weather underground, http://www.wunderground.com/history/airport/KORD/2014/1/1/CustomHistory.html?dayend=31&monthend=12&yearend=2014&req_city=NA&req_state=NA&req_statename=NA&format=1. It contains high/low/min tempreture, dew points, windspeed, etc. WeatherDensity_6h.R is a script for cleaning up weather data by assigning min/max/mean data to different period of the day. Also, you can use the script to merge the weather data with crime density data.

Prepare trainingand test data

In script ModelingPrediction.R, you can first merge crime density, weather data and sentiment features together according to their grid_id and period number (sixhr_n). Then mark the crime points as positive points and non-crime points as negative points. Since the response is unbalanced, use the ovun.sample to under-sample negative response. The returned data frame is:

Jan.train.balanced

Model and Evaluation

ModelingPrediction.R shows three ways to build predicting model. glm is for logistic regression; glmnet with pararmeter family = binomial is for the lasso logistic regression and randomForest is for randomforest model. Then use predict to get the predicted value:

theft.threat

Lastly, user plot.surveillance.curve function to evaluate the performance of the prediction.

Version

Version 1.0

Thanks

This code was created under Mathew Gerber's guide within Capstone course at UVa during Spring 2015. Respect and gratitude is due to Professor Gerber. Also, appreciate is due to Northrop Grumman's sponsorship.

Contact

Shin Xinyu Chen

GitHub: https://github.com/ShinXinyuChen
e-mail: [email protected]; [email protected]
Twitter: @chen_xinyu

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
CrimeDensity.R		CrimeDensity.R
CrimePredictionUtil.R		CrimePredictionUtil.R
ModelingPrediction.R		ModelingPrediction.R
README.md		README.md
SentimentLexicon.R		SentimentLexicon.R
TwitterSentiment_6h.R		TwitterSentiment_6h.R
WeatherDensity_6h.R		WeatherDensity_6h.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crime_Prediction

Utility Functions

Create Senitment Lexicon Dictionary

Connnect to PostgresSQL Database

Clean up twitter data and calculate sentiment score

Get crime density using KDE

Clean up weather data

Prepare trainingand test data

Model and Evaluation

Version

Thanks

Contact

Shin Xinyu Chen

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crime_Prediction

Utility Functions

Create Senitment Lexicon Dictionary

Connnect to PostgresSQL Database

Clean up twitter data and calculate sentiment score

Get crime density using KDE

Clean up weather data

Prepare trainingand test data

Model and Evaluation

Version

Thanks

Contact

Shin Xinyu Chen

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages