CClaude Code Catalog
All Patterns

Data Pipeline Structure

Folder StructureIntermediate

Data pipeline projects involve multiple stages — extraction, transformation, loading, and analysis. Without a clear structure, Claude Code cannot distinguish between a data source connector and a transformation function. This pattern separates pipeline stages into distinct folders with consistent naming, making it trivial for Claude to add new data sources, transformers, or output destinations.

dataetlpipelinepythonanalyticsfolder-structure

Pattern Code

data-pipeline/ ├── CLAUDE.md # Pipeline conventions ├── config/ │ ├── sources.yaml # Data source definitions │ ├── destinations.yaml # Output target configs │ └── schedules.yaml # Cron/scheduling config ├── src/ │ ├── extractors/ # Stage 1: Data extraction │ │ ├── base.py # Abstract extractor class │ │ ├── api_extractor.py # REST API sources │ │ ├── db_extractor.py # Database sources │ │ ├── csv_extractor.py # File-based sources │ │ └── __init__.py │ ├── transformers/ # Stage 2: Data transformation │ │ ├── base.py # Abstract transformer │ │ ├── clean.py # Data cleaning rules │ │ ├── enrich.py # Data enrichment │ │ ├── aggregate.py # Aggregation logic │ │ └── __init__.py │ ├── loaders/ # Stage 3: Data loading │ │ ├── base.py # Abstract loader │ │ ├── postgres_loader.py # PostgreSQL output │ │ ├── bigquery_loader.py # BigQuery output │ │ ├── csv_loader.py # CSV file output │ │ └── __init__.py │ ├── pipelines/ # Orchestration │ │ ├── daily_sales.py # Full pipeline definitions │ │ ├── weekly_report.py │ │ └── __init__.py │ ├── validators/ # Data quality checks │ │ ├── schema_validator.py │ │ └── quality_checks.py │ └── utils/ │ ├── logging.py │ └── metrics.py ├── tests/ │ ├── fixtures/ # Sample data for tests │ │ ├── sample_sales.csv │ │ └── sample_api_response.json │ ├── test_extractors/ │ ├── test_transformers/ │ └── test_loaders/ ├── data/ # Local data (gitignored) │ ├── raw/ # Extracted raw data │ ├── processed/ # Transformed data │ └── output/ # Final output ├── requirements.txt └── pyproject.toml # CLAUDE.md excerpt: # - Each stage (extract/transform/load) inherits from base.py. # - New data source = new extractor + config in sources.yaml + test. # - Transformers are composable — chain them in pipelines/. # - data/ folder is gitignored. Use tests/fixtures/ for test data. # - All extractors/loaders must handle connection retries. # - Validators run between transform and load stages.

Copy this pattern into your project configuration to implement.

Terminal Preview

Data Pipeline Structure

About Data Pipeline Structure

Claude Code patterns are proven architectural designs and workflow structures that help you tackle complex development scenarios. Data Pipeline Structure is a Folder Structure pattern at the Intermediate level that provides a tested, repeatable approach you can adapt to your projects for more efficient and consistent results.

Related Patterns