Data Cleaner

DataBeginner

Automates data cleaning tasks including deduplication, missing value handling, date format normalization, and text standardization. Provides before/after comparison and reusable scripts.

Trigger/clean

Frequency1-2x/week

Operations manager? Run /clean to unify different date formats across vendors in one pass

Data engineer? Auto-generate reusable scripts for repetitive preprocessing logic

Data CleaningETLPreprocessingAutomation

How It Works

Run /clean [file] -> data profiling

↓

Phase 1: 4 cleaning tasks in parallel

dedup

Remove duplicates

null-handle

Handle missing values

format-fix

Standardize formats

outlier-fix

Handle outliers

↓

Compare results + generate script

↓

✓ Cleaned data + reusable script

Skill Code

# Data Cleaning Skill
## Trigger: /clean [file]

When invoked on a data file:

1. Profile data quality issues:
 - Duplicate rows (exact + fuzzy)
 - Missing values by column
 - Inconsistent formats (dates, phones, addresses)
 - Encoding issues (UTF-8, EUC-KR)
 - Leading/trailing whitespace

2. Apply cleaning rules:
 - Remove exact duplicates
 - Standardize date formats → ISO 8601
 - Normalize phone numbers → consistent format
 - Fill missing values (strategy per column)
 - Trim whitespace, fix encoding

3. Output format:
---
## 🧹 Data Cleaning Report

### Before / After
| Metric | Before | After |
|--------|--------|-------|
| Rows | [X] | [Y] |
| Duplicates | [X] | 0 |
| Missing values | [X%] | [Y%] |

### Actions Taken
1. Removed [N] duplicate rows
2. Standardized [column] date format
3. Filled [column] nulls with [strategy]

### Generated Script
```python
# Reusable cleaning script
[pandas/polars code]
```
---

Copy and paste into your CLAUDE.md to start using immediately.

How Data Cleaner Works

Data Cleaner scans your dataset for inconsistencies — duplicate rows, missing values, formatting variations, type mismatches — then generates a cleaning pipeline that standardizes, deduplicates, and validates the output.

When to Use Data Cleaner

Essential in any data pipeline where raw input is messy — especially when combining data from multiple sources with different formatting conventions, date formats, and naming standards that need harmonization.

Key Strengths

Detects duplicates, missing values, and format inconsistencies
Generates reproducible cleaning pipelines
Standardizes formats across heterogeneous data sources
Validates output data quality after cleaning

Same Category

Data View All

CSV Data Analyzer

Auto-analyzes CSV files and delivers insights with visualizations.

SQL Query Builder

Converts natural language questions into SQL queries.

Dashboard Generator

Auto-generates chart and dashboard code from data.

Popular in Other Categories

Session Summary

WorkflowAutomatically summarizes changes and next steps at the end of a work session.

Smart Commit

CodingAnalyzes changes and auto-generates meaningful commit messages.

CLAUDE.md Builder

ProductivityAnalyzes your project and auto-generates an optimized CLAUDE.md.