DATA1001 Project 2 Template

Author

SID 550640781

Client Bio

Client:NYC Mayor’s Office of Criminal Justice

Recommendation

Prioritize interventions targeting the spatiotemporal hotspots of ‘theft’: increase undercover operations and patrols in high-incidence areas, combined with theft prevention campaigns and environmental modifications; simultaneously improve the completeness of the ‘victim ethnicity’ and case details fields to enhance cross-group comparison and resource allocation.

Evidence

Research Question 1: Is victim race independent of offense type?

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
NYPD = read.csv("NYPD_Complaint_Data_2024.csv")
library(dplyr)
NYPD_clean = NYPD %>%
  rename(
    Event_ID       = CMPLNT_NUM,
    Start_Date     = CMPLNT_FR_DT,
    Start_time     = CMPLNT_FR_TM,
    End_date       = CMPLNT_TO_DT,
    End_time       = CMPLNT_TO_TM,
    Area_code      = ADDR_PCT_CD,
    Report_date    = RPT_DT,
    Offense_key    = KY_CD,
    Offense_desc   = OFNS_DESC,
    PD_code        = PD_CD,
    PD_desc        = PD_DESC,
    Crime_Complete = CRM_ATPT_CPTD_CD,
    Law_Level      = LAW_CAT_CD,
    Barough        = BORO_NM,
    Location_desc  = LOC_OF_OCCUR_DESC
  )
Code
table(NYPD_clean$VIC_RACE)

                        (null) AMERICAN INDIAN/ALASKAN NATIVE 
                            32                            622 
      ASIAN / PACIFIC ISLANDER                          BLACK 
                         10058                          30897 
                BLACK HISPANIC                        UNKNOWN 
                          6022                          49895 
                         WHITE                 WHITE HISPANIC 
                         16376                          22500 
Code
ggplot(NYPD_clean,aes(x = VIC_RACE,fill = VIC_RACE)) +
  geom_bar() + ylab('Count_vic')

Code
  theme(axis.text.x = element_text(angle = 45,hjust = 1))
List of 1
 $ axis.text.x:List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : num 45
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi FALSE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE
Code
Top5_race_data = NYPD_clean[
  NYPD_clean$Offense_desc %in% names(sort(table(NYPD_clean$Offense_desc),decreasing = TRUE))[1:5] & NYPD_clean$VIC_RACE %in% names(sort(table(NYPD_clean$VIC_RACE),decreasing = TRUE))[1:3],
]
Code
race_count = Top5_race_data %>%
  count(VIC_RACE)
ggplot(race_count,aes(x = "",y = n,fill = VIC_RACE)) +
  geom_bar(stat = "identity",width = 1) +
  coord_polar(theta = "y") +
  labs(title = "Victim Race Pie Chat") +
  theme_void() + 
  theme(legend.title = element_blank())

Code
table_offense_race = table(Top5_race_data$Offense_desc,Top5_race_data$VIC_RACE)
table_offense_race
                                
                                 BLACK UNKNOWN WHITE HISPANIC
  ASSAULT 3 & RELATED OFFENSES    5364     978           4345
  CRIMINAL MISCHIEF & RELATED OF  2354    2909           1422
  GRAND LARCENY                   1860    2085           1809
  HARRASSMENT 2                   7984    1216           4759
  PETIT LARCENY                   3936   14543           2704
Code
library(stringr)
ggplot(Top5_race_data,aes(x = Offense_desc,fill = VIC_RACE)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  labs(x = "Offense type(top5)",
       y = "Percentage",
       fill = "Victim Race(top3)",
       title = "Percentage of Offense Types(Top5)") +
  theme(axis.text.x = element_text(angle = 45,hjust = 1))

Based on the numerical table and graphical visualization (100% stacked bar chart), the top five crime types show distinct racial patterns among victims:

• Petty theft has the highest number of victims overall, with a very large proportion of victims from the “unknown” race (14,543 victims), which may indicate issues in reporting or classification.

• Assault in the third degree and related crimes and harassment in the second degree show the highest number of Black victims (5,364 and 7,984 respectively), indicating a disproportionate impact of violent crimes on this group.

• Property damage and related crimes and grand theft have a more even racial distribution of victims, with no single dominant category.

• For petty theft, the “unknown” group accounts for the vast majority, with over 70% of victims falling into this category.

• Assault 3 and Harassment 2 are visually dominated by the Black category, consistent with the numerical findings.

• Criminal Mischief and Grand Larceny show a more balanced composition across races, suggesting that racial disparities are less pronounced for property-related crimes.

• Together, both numerical and graphical analyses indicate that victim race composition varies substantially across offense types, implying potential non-independence between race and type of offense.

After a rigorous hypothesis testing process, we further confirm that this relationship is not due to random chance but reflects a real pattern in the data.

Since the p-value is so small, this indicates that the results are statistically significant. Simply put, the data suggests that the victim’s race is related to the type of crime—the distribution of victim races varies significantly across different crime types. Overall, the chi-square test statistically confirms the visual and descriptive observations that there are significant differences in the racial composition of victims across crime types, indicating that these two variables are not independent.

Limitations:

Question: NYPD data consists of reported cases, which are influenced by factors such as culture, insurance requirements, and police-community relations. The willingness to report crimes varies significantly across different jurisdictions and populations (especially for property-related crimes). Hotspots may reflect areas where people are more willing to report crimes rather than areas with genuinely higher victimization risk.

Impact: Interpretations of spatial or population differences may be biased, and resource allocation may be skewed toward areas with higher reporting rates.

Ethics Statement

There is a short statement explaining how 1 of the Shared Values and 1 of the Ethical Principles has been adhered to (see ISI: https://isi-web.org/declaration-professional-ethics).

AI usage statement

I used generative AI (ChatGPT, model: GPT-5 Thinking) to provide the following limited support during the writing and verification of this report: •Help summarize the exploratory analysis results I have completed into English paragraphs (Client Bio, Recommendation, Evidence draft) •According to the assignment template requirements, it is recommended to provide two external references and give their citation formats.

References

•Braga, A. A., Papachristos, A. V., & Hureau, D. M. (2014). The effects of hot spots policing on crime: An updated systematic review and meta‐analysis. Justice Quarterly, 31(4), 633–663. •Cozens, P., Saville, G., & Hillier, D. (2005). Crime prevention through environmental design (CPTED): A review and modern bibliography. Property Management, 23(5), 328–356.

Appendix

Hypothesis: •H_0: Victim race and offense type are independent. •H_1: Victim race and offense type are dependent.

Assumptions:

Code
expect_table = chisq.test(table_offense_race,correct = FALSE)$expected
expect_table
                                
                                    BLACK  UNKNOWN WHITE HISPANIC
  ASSAULT 3 & RELATED OFFENSES   3942.973 3985.707       2758.320
  CRIMINAL MISCHIEF & RELATED OF 2466.433 2493.165       1725.402
  GRAND LARCENY                  2122.940 2145.949       1485.110
  HARRASSMENT 2                  5150.178 5205.997       3602.825
  PETIT LARCENY                  7815.476 7900.181       5467.343

After testing,Victim race and offense type are independent.

Observed test statistic:

Code
chisq.test(table_offense_race,correct = FALSE)

    Pearson's Chi-squared test

data:  table_offense_race
X-squared = 17824, df = 8, p-value < 2.2e-16

Conclusion:The test produced a chi-squared statistic of x-squared = 17824, with a p-value < 2.2e-16. Because the p-value is much lower than the commonly used significance level, the victim’s race and the type of crime are independent.