Clean and Consolidate Duplicated NRCS Practices Data — clean_nrcs

This function reads an Excel file containing NRCS practice data, standardizes column names, checks whether the user-supplied HUC12 watershed codes are present, flags duplicated practice entries, and returns a cleaned, dataset with total applied amounts per group.

Usage

clean_nrcs_data(dataset_path, huc_12_codes)

Arguments

dataset_path: A character string indicating the path to the .xlsx Excel file to read.
huc_12_codes: A character vector of HUC12 watershed codes to verify against the dataset.

Value

A data.frame containing:

Cleaned and standardized variable names
Duplicate detection variables: duplicates_id_yr_code, duplicates_id_yr_code_amt
Duplicated rows, each representing a unique land_unit_id–practice_code–applied_year
applied_amount updated to reflect the total for grouped duplicates

Details

Duplicate records are defined in two stages:

By land_unit_id, practice_code, and applied_year (duplicates_id_yr_code)
By the same fields plus applied_amount (duplicates_id_yr_code_amt)

If applied_amount has different values for the sameland_unit_id, practice_code, and applied_year, then applied_amount is summed

If applied_amount has multiple of the same values by land_unit_id, practice_code, and applied_year, then those records are considered true duplicates and removed

The original dataset names will be cleaned to lowercase snake_case using janitor::clean_names().

Examples

if (FALSE) { # \dontrun{
clean_nrcs_data(
  dataset_path = "data/nrcs_practices.xlsx",
  huc_12_codes = c("170601010101", "170601010102")
)
} # }