CClaude Code Catalog
All Skills

Data Cleaner

DataBeginner

Automates data cleaning tasks including deduplication, missing value handling, date format normalization, and text standardization. Provides before/after comparison and reusable scripts.

Trigger/clean
Frequency1-2x/week

Operations manager? Run /clean to unify different date formats across vendors in one pass

Data engineer? Auto-generate reusable scripts for repetitive preprocessing logic

Data CleaningETLPreprocessingAutomation

How It Works

Run /clean [file] -> data profiling
Phase 1: 4 cleaning tasks in parallel
dedup
Remove duplicates
null-handle
Handle missing values
format-fix
Standardize formats
outlier-fix
Handle outliers
Compare results + generate script
Cleaned data + reusable script

Skill Code

# Data Cleaning Skill ## Trigger: /clean [file] When invoked on a data file: 1. Profile data quality issues: - Duplicate rows (exact + fuzzy) - Missing values by column - Inconsistent formats (dates, phones, addresses) - Encoding issues (UTF-8, EUC-KR) - Leading/trailing whitespace 2. Apply cleaning rules: - Remove exact duplicates - Standardize date formats → ISO 8601 - Normalize phone numbers → consistent format - Fill missing values (strategy per column) - Trim whitespace, fix encoding 3. Output format: --- ## 🧹 Data Cleaning Report ### Before / After | Metric | Before | After | |--------|--------|-------| | Rows | [X] | [Y] | | Duplicates | [X] | 0 | | Missing values | [X%] | [Y%] | ### Actions Taken 1. Removed [N] duplicate rows 2. Standardized [column] date format 3. Filled [column] nulls with [strategy] ### Generated Script ```python # Reusable cleaning script [pandas/polars code] ``` ---

Copy and paste into your CLAUDE.md to start using immediately.

How Data Cleaner Works

Data Cleaner scans your dataset for inconsistencies — duplicate rows, missing values, formatting variations, type mismatches — then generates a cleaning pipeline that standardizes, deduplicates, and validates the output.

When to Use Data Cleaner

Essential in any data pipeline where raw input is messy — especially when combining data from multiple sources with different formatting conventions, date formats, and naming standards that need harmonization.

Key Strengths

  • Detects duplicates, missing values, and format inconsistencies
  • Generates reproducible cleaning pipelines
  • Standardizes formats across heterogeneous data sources
  • Validates output data quality after cleaning

Same Category

Data View All

Popular in Other Categories