Data Pipeline Structure

Folder StructureIntermediate

Data pipeline projects involve multiple stages — extraction, transformation, loading, and analysis. Without a clear structure, Claude Code cannot distinguish between a data source connector and a transformation function. This pattern separates pipeline stages into distinct folders with consistent naming, making it trivial for Claude to add new data sources, transformers, or output destinations.

dataetlpipelinepythonanalyticsfolder-structure

Pattern Code

data-pipeline/
├── CLAUDE.md                      # Pipeline conventions
├── config/
│   ├── sources.yaml               # Data source definitions
│   ├── destinations.yaml          # Output target configs
│   └── schedules.yaml             # Cron/scheduling config
├── src/
│   ├── extractors/                # Stage 1: Data extraction
│   │   ├── base.py                # Abstract extractor class
│   │   ├── api_extractor.py       # REST API sources
│   │   ├── db_extractor.py        # Database sources
│   │   ├── csv_extractor.py       # File-based sources
│   │   └── __init__.py
│   ├── transformers/              # Stage 2: Data transformation
│   │   ├── base.py                # Abstract transformer
│   │   ├── clean.py               # Data cleaning rules
│   │   ├── enrich.py              # Data enrichment
│   │   ├── aggregate.py           # Aggregation logic
│   │   └── __init__.py
│   ├── loaders/                   # Stage 3: Data loading
│   │   ├── base.py                # Abstract loader
│   │   ├── postgres_loader.py     # PostgreSQL output
│   │   ├── bigquery_loader.py     # BigQuery output
│   │   ├── csv_loader.py          # CSV file output
│   │   └── __init__.py
│   ├── pipelines/                 # Orchestration
│   │   ├── daily_sales.py         # Full pipeline definitions
│   │   ├── weekly_report.py
│   │   └── __init__.py
│   ├── validators/                # Data quality checks
│   │   ├── schema_validator.py
│   │   └── quality_checks.py
│   └── utils/
│       ├── logging.py
│       └── metrics.py
├── tests/
│   ├── fixtures/                  # Sample data for tests
│   │   ├── sample_sales.csv
│   │   └── sample_api_response.json
│   ├── test_extractors/
│   ├── test_transformers/
│   └── test_loaders/
├── data/                          # Local data (gitignored)
│   ├── raw/                       # Extracted raw data
│   ├── processed/                 # Transformed data
│   └── output/                    # Final output
├── requirements.txt
└── pyproject.toml

# CLAUDE.md excerpt:
# - Each stage (extract/transform/load) inherits from base.py.
# - New data source = new extractor + config in sources.yaml + test.
# - Transformers are composable — chain them in pipelines/.
# - data/ folder is gitignored. Use tests/fixtures/ for test data.
# - All extractors/loaders must handle connection retries.
# - Validators run between transform and load stages.

Copy this pattern into your project configuration to implement.

Terminal Preview

Data Pipeline Structure

About Data Pipeline Structure

Claude Code patterns are proven architectural designs and workflow structures that help you tackle complex development scenarios. Data Pipeline Structure is a Folder Structure pattern at the Intermediate level that provides a tested, repeatable approach you can adapt to your projects for more efficient and consistent results.

Related Patterns

REST API Project Structure

Battle-tested folder structure for Node.js/Express or Fastify REST API projects with clear separation of concerns.

Full-Stack App Structure

Next.js App Router project structure optimized for Claude Code with clear frontend/backend separation.

Monorepo Structure

Turborepo/pnpm workspace monorepo structure with shared packages, apps, and layered CLAUDE.md files.