데이터 클리닝

데이터입문

중복 제거, 결측치 처리, 날짜 형식 통일, 텍스트 정규화 등 데이터 정제 작업을 자동화합니다. 정제 전후 비교와 스크립트를 함께 제공합니다.

트리거/clean

사용빈도주 1-2회

오퍼레이션 담당자라면? 거래처별로 다른 날짜 포맷을 /clean으로 한 번에 통일

데이터 엔지니어라면? 반복적인 전처리 로직을 재사용 가능한 스크립트로 자동 생성

데이터 클리닝ETL전처리자동화

작동 흐름

/clean [파일] 실행 → 데이터 프로파일링

↓

Phase 1: 4개 정제 병렬

dedup

중복 제거

null-handle

결측치 처리

format-fix

형식 표준화

outlier-fix

이상치 처리

↓

정제 결과 비교 + 스크립트 생성

↓

✓ 정제된 데이터 + 재사용 가능한 스크립트

스킬 코드

# Data Cleaning Skill
## Trigger: /clean [file]

When invoked on a data file:

1. Profile data quality issues:
 - Duplicate rows (exact + fuzzy)
 - Missing values by column
 - Inconsistent formats (dates, phones, addresses)
 - Encoding issues (UTF-8, EUC-KR)
 - Leading/trailing whitespace

2. Apply cleaning rules:
 - Remove exact duplicates
 - Standardize date formats → ISO 8601
 - Normalize phone numbers → consistent format
 - Fill missing values (strategy per column)
 - Trim whitespace, fix encoding

3. Output format:
---
## 🧹 Data Cleaning Report

### Before / After
| Metric | Before | After |
|--------|--------|-------|
| Rows | [X] | [Y] |
| Duplicates | [X] | 0 |
| Missing values | [X%] | [Y%] |

### Actions Taken
1. Removed [N] duplicate rows
2. Standardized [column] date format
3. Filled [column] nulls with [strategy]

### Generated Script
```python
# Reusable cleaning script
[pandas/polars code]
```
---

복사해서 CLAUDE.md에 붙여넣으면 바로 사용할 수 있습니다.