Crime Data Analysis

What are the software requirements?

Python IDE
R
Microsoft Excel

Where to download the data?

Crime events 2019: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20events%202019
Crime events 2020: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20events%202020
Crime events with coordinates 0515-0615, 2020: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20events%20with%20coordinates%200515-0615%2C%202020
Crime number statistics 2019: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20number%20statistics%202019
Crime number statistics 2020: https://github.com/stccenter/crime-data-analysis/tree/master/Crime%20number%20statistics%202020

How to get the results?

Run the below scripts. Install required packages for the scripts.

geocoding.py This script is used to convert addresses (like a street address) into geographic coordinates (like latitude and longitude).
1. package: geocoder, pandas, csv
2. variables that can be changed: a. input_file – input file path. b. output_file – output file path. c. date - the date of crime events of the form mm/dd/yy
statistic.py This script is used to count the number of crimes per day in each city/county based on crime event data.
1. package: pandas
2. variables that can be changed: a. input_file – input file path. b. output_file – output file path.
boxplot.py This script is used to generate a boxplot of crime rates of all counties/cities.
1. package: os, numpy, pandas, matplotlib
2. variables that can be changed: a. input_path – root directory of crime input file. b. pop_file - input file with population.
linechart.py This script is used to generate a line chart of seven crime types.
1. package: os, numpy, pandas, matplotlib
2. variables that can be changed: a. input_path – root directory of input file. b. output_path – root directory of output image.
pcc.py This script is used to calculate Pearson Correlation Coefficient value between different variables.
1. package: pandas
2. variables that can be changed: a) input_file – input file path. b) output_file – output file path.
ANN
1. ANN.py This script is used to calculate the Average Nearest Neighbor value for the crime cases every day over the target region.
  1. package: math, pandas, numpy, scipy
  2. variables that can be changed: a. input_file – input file path b. output_file – output file path. c. area – area of the county/city (square kilometer) d. crime_type – the type of crime, including Total, Arrest, Arson, Assault, Burglary, Robbery, Shooting, Theft, Vandalism, and Other
2. barchart.py This script is used to visualize the results of ANN.py through a bar graph.
  1. package: pandas, numpy, matplotlib
  2. variables that can be changed: a. input_file – input file path.
hotspot
1. hotspot.py This script is used to analyze crime hotspots of a given day. A heat map layer will be created.
  1. package: pandas, folium
  2. variables that can be changed:
    1. input_file – input file path.
    2. output_file – output file path.
    3. date – the date of crime events of the form mm/dd/yy
    4. crime_type – the type of crime, including Total, Arrest, Arson, Assault, Burglary, Robbery, Shooting, Theft, Vandalism, and Other
  3. parameters:
    1. location – Latitude and Longitude of Map (Northing, Easting).
    2. zoom_start – Initial zoom level for the map.
    3. tiles – Map tileset to use.
    4. control_scale – Whether to add a control scale on the map.
    5. data – List of points of the form [lat, lng] or [lat, lng, weight].
    6. max_val – Maximum point intensity.
    7. min_opacity – The minimum opacity the heat will start at.
    8. radius – Radius of each “point” of the heatmap.
    9. blur – Amount of blur.
    10. gradient – Color gradient config.
    11. max_zoom – Zoom level where the points reach maximum intensity (as intensity scales with zoom).
2. hotspot_withtime.py This script is used to analyze crime hotspots of a number of days. A dynamic heat map layer with time slider will be created.
  1. package: pandas, folium
  2. variables that can be changed:
    1. input_file – input file path.
    2. output_file – output file path.
    3. crime_type – the type of crime, including Total, Arrest, Arson, Assault, Burglary, Robbery, Shooting, Theft, Vandalism, and Other
  3. parameters:
    1. location – Latitude and Longitude of Map (Northing, Easting).
    2. zoom_start – Initial zoom level for the map.
    3. tiles – Map tileset to use.
    4. control_scale – Whether to add a control scale on the map.
    5. data – list of list of points of the form [lat, lng] or [lat, lng, weight].
    6. index – Index giving the label (or timestamp) of the elements of data.
    7. max_opacity – The maximum opacity for the heatmap.
    8. min_opacity – The minimum opacity the heat will start at.
    9. radius – Radius of each “point” of the heatmap.
    10. auto_play – Automatically play the animation across time.
    11. display_index – Zoom level where the points reach maximum intensity (as intensity scales with zoom).
Lasso.R This script is used to build a Lasso logistic regression model and export the coefficient of independent variables.
1. package: glmnet
2. variables that can be changed
  1. setwd – current working path.
  2. loadx – input file name with independent variables.
  3. loady - input file name with dependent variable.
3. parameters:
  1. x – matrix of predictor variables
  2. y – the response or outcome variable, which is a binary variable.
  3. family – the response type. Use “binomial” for a binary outcome variable.
  4. alpha – the elasticnet mixing parameter. Allowed values include:
    1. “1”: for lasso regression
    2. “0”: for ridge regression
    3. a value between 0 and 1 (say 0.3) for elastic net regression.
  5. type.measure – the loss used for cross-validation.
  6. lambda – a numeric value defining the amount of shrinkage. Should be specify by analyst.

Tutorial video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crime Data Analysis

What are the software requirements?

Where to download the data?

How to get the results?

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
6. ANN		6. ANN
7. Hotspots		7. Hotspots
8. Lasso		8. Lasso
BLM Protest events		BLM Protest events
Crime events 2019		Crime events 2019
Crime events 2020		Crime events 2020
Crime events with coordinates 0515-0615, 2020		Crime events with coordinates 0515-0615, 2020
Crime number statistics 2019		Crime number statistics 2019
Crime number statistics 2020		Crime number statistics 2020
1. geocoding.py		1. geocoding.py
2. statistic.py		2. statistic.py
3. boxplot.py		3. boxplot.py
4. linechart.py		4. linechart.py
5. pcc.py		5. pcc.py
Factors.csv		Factors.csv
LICENSE		LICENSE
README.md		README.md
Screenshot.png		Screenshot.png

License

stccenter/crime-data-analysis

Folders and files

Latest commit

History

Repository files navigation

Crime Data Analysis

What are the software requirements?

Where to download the data?

How to get the results?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages