Skip to contents

This function reads an Excel file containing NRCS practice data, standardizes column names, checks whether the user-supplied HUC12 watershed codes are present, flags duplicated practice entries, and returns a cleaned, dataset with total applied amounts per group.

Usage

clean_nrcs_data(dataset_path, huc_12_codes)

Arguments

dataset_path

A character string indicating the path to the .xlsx Excel file to read.

huc_12_codes

A character vector of HUC12 watershed codes to verify against the dataset.

Value

A data.frame containing:

  • Cleaned and standardized variable names

  • Duplicate detection variables: duplicates_id_yr_code, duplicates_id_yr_code_amt

  • Duplicated rows, each representing a unique land_unit_idpractice_codeapplied_year

  • applied_amount updated to reflect the total for grouped duplicates

Details

Duplicate records are defined in two stages:

  1. By land_unit_id, practice_code, and applied_year (duplicates_id_yr_code)

  2. By the same fields plus applied_amount (duplicates_id_yr_code_amt)

If applied_amount has different values for the sameland_unit_id, practice_code, and applied_year, then applied_amount is summed

If applied_amount has multiple of the same values by land_unit_id, practice_code, and applied_year, then those records are considered true duplicates and removed

The original dataset names will be cleaned to lowercase snake_case using janitor::clean_names().

Examples

if (FALSE) { # \dontrun{
clean_nrcs_data(
  dataset_path = "data/nrcs_practices.xlsx",
  huc_12_codes = c("170601010101", "170601010102")
)
} # }