forked from AdamWilsonLabEDU/2024-geo511-spatial-data-science-final-project-GEO511_QuartoProjectTemplate
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.qmd
102 lines (73 loc) · 3.4 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: "Crime Trend and Spatial Analysis in Chicago (2010 - 2023)"
author: Satya
date: Oct 22, 2024
date-format: long
---
# Introduction
# Welcome to the Crime Analysis Project
This website presents a detailed analysis of crime trends and spatial distributions in Chicago from 2010 to 2023.
# Materials and methods implemented
Data Cleaning and Preparation
I am using the dplyr and tidyr packages in R Studio to clean and organize the dataset, removing any inconsistencies or missing values to ensure data accuracy.
Time-Series Analysis
I am leveraging ggplot2 to plot trends in crime rates over time, analyzing how crime levels are evolving throughout the study period.
Spatial Analysis
Using the sf package, I am mapping crime locations to visualize their spatial distribution across Chicago's neighborhoods, helping to identify patterns and areas of concern.
Cluster Analysis
I am applying clustering techniques to identify crime hotspots and determine whether certain crime types are showing spatial concentrations in specific regions of the city.
Predictive Modeling
I am using machine learning techniques, such as caret and randomForest, to build models that predict crime occurrences based on time and location, providing insights for crime prevention strategies.
Required packages:
# Load necessary libraries
```{r}
library(dplyr)
library(tidyr)
```
# Load the dataset
```{r}
df <- read.csv("data/data.csv", stringsAsFactors = FALSE)
```
# View the first few rows of the dataset
```{r}
head(df)
```
# Drop duplicate rows
```{r}
df <- df %>% distinct()
```
# Remove rows with any missing values
```{r}
df <- df %>% drop_na()
```
# Identify and clean inconsistent values (example: convert character columns to lowercase)
```{r}
df <- df %>% mutate_if(is.character, tolower)
```
# Replace any incorrect or placeholder values like "NA" or "unknown" with NA
```{r}
df <- df %>%
mutate(across(where(is.character), ~ na_if(., "NA"))) %>%
mutate(across(where(is.character), ~na_if(., "unknown")))
```
# Re-check for missing values and inconsistencies
```{r}
summary(df)
```
# Results
The below bar plot presents an analysis of the distribution of different types of crimes in the dataset. The aim is to visualize the frequency of each crime type to identify trends and understand which types of crimes are most prevalent.
```{r}
library(ggplot2)
# Create a histogram of the Year column
ggplot(df, aes(x = Year)) +
geom_histogram(binwidth = 1, fill = "beige", color = "black", alpha = 0.7) +
labs(title = "Distribution of Crimes by Year", x = "Year", y = "Frequency") +
theme_minimal()
```
# Conclusions
The histogram of crime distribution by year highlights trends in criminal activity, aiding law enforcement in resource allocation and community safety initiatives. It underscores the influence of external factors on crime fluctuations and sets the stage for further research into specific crime types and their underlying causes.
The further project work includes the time series analysis, spatial analysis, cluster analysis, and prediction analysis.
# References
1. Chicago Crime map: https://www3.nd.edu/~skumar5/teaching/additional/spring-2022-eg/project-06-13/index.html
2. NYC Crime Map: https://data.cityofnewyork.us/Public-Safety/Crime-Map-/5jvd-shfj
3. LA Crime Analysis:https://crimegrade.org/violent-crime-los-angeles-ca/