- Python IDE
- R
- Microsoft Excel
- Crime events 2019: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20events%202019
- Crime events 2020: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20events%202020
- Crime events with coordinates 0515-0615, 2020: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20events%20with%20coordinates%200515-0615%2C%202020
- Crime number statistics 2019: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20number%20statistics%202019
- Crime number statistics 2020: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20number%20statistics%202020
Run the below scripts. Install required packages for the scripts.
-
geocoding.py This script is used to convert addresses (like a street address) into geographic coordinates (like latitude and longitude).
- package: geocoder, pandas, csv
- variables that can be changed: a. input_file – input file path. b. output_file – output file path. c. date - the date of crime events of the form mm/dd/yy
-
statistic.py This script is used to count the number of crimes per day in each city/county based on crime event data.
- package: pandas
- variables that can be changed: a. input_file – input file path. b. output_file – output file path.
-
boxplot.py This script is used to generate a boxplot of crime rates of all counties/cities.
- package: os, numpy, pandas, matplotlib
- variables that can be changed: a. input_path – root directory of crime input file. b. pop_file - input file with population.
-
linechart.py This script is used to generate a line chart of seven crime types.
- package: os, numpy, pandas, matplotlib
- variables that can be changed: a. input_path – root directory of input file. b. output_path – root directory of output image.
-
pcc.py This script is used to calculate Pearson Correlation Coefficient value between different variables.
- package: pandas
- variables that can be changed: a) input_file – input file path. b) output_file – output file path.
-
ANN
- ANN.py
This script is used to calculate the Average Nearest Neighbor value for the crime cases every day over the target region.
- package: math, pandas, numpy, scipy
- variables that can be changed: a. input_file – input file path b. output_file – output file path. c. area – area of the county/city (square kilometer) d. crime_type – the type of crime, including Total, Arrest, Arson, Assault, Burglary, Robbery, Shooting, Theft, Vandalism, and Other
- barchart.py
This script is used to visualize the results of ANN.py through a bar graph.
- package: pandas, numpy, matplotlib
- variables that can be changed: a. input_file – input file path.
- ANN.py
This script is used to calculate the Average Nearest Neighbor value for the crime cases every day over the target region.
-
hotspot
- hotspot.py
This script is used to analyze crime hotspots of a given day. A heat map layer will be created.
- package: pandas, folium
- variables that can be changed:
- input_file – input file path.
- output_file – output file path.
- date – the date of crime events of the form mm/dd/yy
- crime_type – the type of crime, including Total, Arrest, Arson, Assault, Burglary, Robbery, Shooting, Theft, Vandalism, and Other
- parameters:
- location – Latitude and Longitude of Map (Northing, Easting).
- zoom_start – Initial zoom level for the map.
- tiles – Map tileset to use.
- control_scale – Whether to add a control scale on the map.
- data – List of points of the form [lat, lng] or [lat, lng, weight].
- max_val – Maximum point intensity.
- min_opacity – The minimum opacity the heat will start at.
- radius – Radius of each “point” of the heatmap.
- blur – Amount of blur.
- gradient – Color gradient config.
- max_zoom – Zoom level where the points reach maximum intensity (as intensity scales with zoom).
- hotspot_withtime.py
This script is used to analyze crime hotspots of a number of days. A dynamic heat map layer with time slider will be created.
- package: pandas, folium
- variables that can be changed:
- input_file – input file path.
- output_file – output file path.
- crime_type – the type of crime, including Total, Arrest, Arson, Assault, Burglary, Robbery, Shooting, Theft, Vandalism, and Other
- parameters:
- location – Latitude and Longitude of Map (Northing, Easting).
- zoom_start – Initial zoom level for the map.
- tiles – Map tileset to use.
- control_scale – Whether to add a control scale on the map.
- data – list of list of points of the form [lat, lng] or [lat, lng, weight].
- index – Index giving the label (or timestamp) of the elements of data.
- max_opacity – The maximum opacity for the heatmap.
- min_opacity – The minimum opacity the heat will start at.
- radius – Radius of each “point” of the heatmap.
- auto_play – Automatically play the animation across time.
- display_index – Zoom level where the points reach maximum intensity (as intensity scales with zoom).
- hotspot.py
This script is used to analyze crime hotspots of a given day. A heat map layer will be created.
-
Lasso.R This script is used to build a Lasso logistic regression model and export the coefficient of independent variables.
- package: glmnet
- variables that can be changed
- setwd – current working path.
- loadx – input file name with independent variables.
- loady - input file name with dependent variable.
- parameters:
- x – matrix of predictor variables
- y – the response or outcome variable, which is a binary variable.
- family – the response type. Use “binomial” for a binary outcome variable.
- alpha – the elasticnet mixing parameter. Allowed values include:
- “1”: for lasso regression
- “0”: for ridge regression
- a value between 0 and 1 (say 0.3) for elastic net regression.
- type.measure – the loss used for cross-validation.
- lambda – a numeric value defining the amount of shrinkage. Should be specify by analyst.