This function reads an Excel file containing NRCS practice data, standardizes column names, checks whether the user-supplied HUC12 watershed codes are present, flags duplicated practice entries, and returns a cleaned, dataset with total applied amounts per group.
Value
A data.frame containing:
Cleaned and standardized variable names
Duplicate detection variables:
duplicates_id_yr_code,duplicates_id_yr_code_amtDuplicated rows, each representing a unique
land_unit_id–practice_code–applied_yearapplied_amountupdated to reflect the total for grouped duplicates
Details
Duplicate records are defined in two stages:
By
land_unit_id,practice_code, andapplied_year(duplicates_id_yr_code)By the same fields plus
applied_amount(duplicates_id_yr_code_amt)
If applied_amount has different values for the sameland_unit_id, practice_code, and applied_year,
then applied_amount is summed
If applied_amount has multiple of the same values by land_unit_id, practice_code, and applied_year,
then those records are considered true duplicates and removed
The original dataset names will be cleaned to lowercase snake_case
using janitor::clean_names().
Examples
if (FALSE) { # \dontrun{
clean_nrcs_data(
dataset_path = "data/nrcs_practices.xlsx",
huc_12_codes = c("170601010101", "170601010102")
)
} # }
