Skip to content

Astoviq/demo_source_data_astoviq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Production-Ready Multi-Database Demo Platform

ClickHouse Docker Python MIT License

EuroStyle Retail Demo Platform

A comprehensive European fashion retail demo system with perfect cross-database consistency, designed for testing and demonstrating modern data platform capabilities.

Overall

Data Consistency - Operations revenue = Finance GL = POS transactions European Market Focused - GDPR compliant, multi-currency, VAT handling
Realistic Records - Across 5 integrated databases with business intelligence ready data
Incremental Updates - Simulate daily business operations with new orders AND operational changes
Docker Ready - One-command deployment with isolated ports
Business Intelligence Ready - Pre-built queries for analytics and reporting

πŸ’‘ Perfect for: Data engineering demos, BI tool testing, ClickHouse learning, analytics platform evaluation


πŸ—οΈ Visual Architecture

ClickHouse interface showing 5 databases: eurostyle_finance, eurostyle_hr, eurostyle_operational, eurostyle_pos, eurostyle_webshop

5-Database Architecture: Unified ClickHouse instance demonstrating Operations as Master architecture with cross-database analytics capability. All databases accessible via single connection (localhost:8124) with perfect referential integrity.

πŸ”— Explore More: Complete database schemas | Visual documentation


⚑ Quick Demo (2 Minutes)

# Clone and start the complete demo system
git clone https://github.com/Astoviq/demo_source_data_astoviq.git
cd demo_source_data_astoviq
chmod +x eurostyle.sh

# Start containers and generate demo data
./eurostyle.sh start
./eurostyle.sh demo-fast    # ~2K customers, 1K orders

# Access the system
open http://localhost:8124  # ClickHouse interface
./eurostyle.sh status       # Check system health

βœ… You now have: 5 databases, 78K+ records, perfect data consistency, ready for BI tools!


πŸ“‹ Overview

EuroStyle Fashion is a realistic Pan-European fashion retailer with:

  • Geographic Coverage: Netherlands πŸ‡³πŸ‡±, Belgium πŸ‡§πŸ‡ͺ, Germany πŸ‡©πŸ‡ͺ, France πŸ‡«πŸ‡·
  • Business Model: Fast-fashion with sustainability focus
  • Multi-Database Architecture: 5 integrated databases with guaranteed consistency
  • Data Volume: 78K+ total records across 5 databases with perfect GL matching and cross-system consistency
  • Realistic Patterns: Seasonal trends, geographic distribution, European market behavior, VAT compliance
  • 🎯 NEW: Universal Data Generator with POS integration ensuring perfect revenue consistency

🏒️ Architecture

Operations as Master Architecture

The system implements Operations as Master architecture where all sales channels flow through a central operations database, then to financial systems - reflecting real enterprise patterns:

πŸ—οΈ Operations as Master Data Flow:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Webshop   │───▢│   Operations DB     │───▢│  General Ledger β”‚
β”‚  (Online)   β”‚    β”‚                     β”‚    β”‚   (Finance)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ All Orders          β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚ β”œβ”€β”€ Online (335)    β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚ └── In-store (1915) β”‚    
β”‚ POS Stores  │───▢│                     β”‚    
β”‚ (Physical)  β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Multi-Database System

The system uses ClickHouse with 5 integrated databases following enterprise data architecture:

eurostyle_operational:  # Operations as Master - Central System (31,050+ records)
  - customers      # 1,000+ European customers with GDPR compliance
  - products       # 500+ fashion items with sustainability metrics  
  - stores         # 35 physical locations across 4 countries
  - orders         # 2,250+ orders (ALL channels: online + in-store)
    - online       # 335 orders (€172,610 revenue, €515 avg)
    - in-store     # 1,915 orders (€389,963 revenue, €204 avg)
  - order_lines    # 2,774+ order items with return tracking
  - inventory      # 27,939+ real-time stock levels across all locations

eurostyle_finance:      # Financial Management System (13,718 records)
  - legal_entities      # 5 European business entities
  - gl_journal_headers  # 600+ journal headers matching orders
  - gl_journal_lines    # 10,836+ GL entries with perfect revenue sync
  - chart_of_accounts   # 26 IFRS-compliant account structure
  - cost_centers        # 15 cost center management
  - fixed_assets        # 100+ asset management with depreciation

eurostyle_hr:           # Human Resources System (11,163 records)
  - employees      # 320+ employees with payroll GL integration
  - departments    # 5 organizational departments
  - performance_reviews # 1,828+ performance reviews and cycles
  - training_programs   # 108+ training programs with 1,863+ records
  - survey_responses    # 4,510+ employee engagement responses

eurostyle_webshop:      # Digital Analytics System (20,279 records)
  - web_sessions        # 3,000+ customer journey sessions
  - web_analytics_events # 7,500+ behavioral tracking events
  - product_reviews     # 500+ customer reviews and ratings
  - search_queries      # 2,400+ search behavior analysis
  - cart_activities     # 2,000+ shopping cart behavior
  - product_recommendations # 3,000+ AI-powered recommendations
  
eurostyle_pos:          # Point of Sales Analytics (7,351 records)
  - transactions   # 1,750+ POS detail records (analytics & payments)
  - transaction_items # 1,750+ line-level transaction details  
  - payments       # 1,826+ payment method tracking
  - employee_shifts # 1,675+ HR-POS staff integration
  - employee_assignments # 70+ store staff assignments
  # Note: All POS sales flow through Operations orders (in-store channel)

Container Configuration

  • Container Name: eurostyle_clickhouse_retail
  • HTTP Port: 8124 (isolated from production)
  • Native Port: 9002 (isolated from production)
  • Database: eurostyle_operational
  • Memory Limit: 2GB (demo-optimized)

πŸ–₯️ Visual Tour

Database Tables and Record Counts

See the actual ClickHouse interface with production-scale data volumes:

Operations (Master System) Finance (GL System)
Operations Database Finance Database
Core operational database with 626K inventory records, 61K customers, and 14K orders demonstrating production-scale data volumes. Financial management system with 85K GL journal lines achieving perfect revenue consistency with operational data.
HR (European Compliance) Webshop (Analytics)
HR Database Webshop Database
Human resources system with 41K employees, comprehensive performance tracking, and European employment law compliance. Digital analytics platform with 104K behavioral events, complete customer journey tracking, and marketing attribution.

πŸ” Point of Sales Database

POS Database

Point of Sales system with 32K transactions, European VAT compliance, and integrated employee performance tracking.

πŸ“Š Key Features: Production-scale volumes, perfect cross-database consistency, realistic business relationships, and European market compliance.

πŸš€ Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.9+ (for data generation)
  • 4GB available RAM
  • 5GB disk space

1. Unified Management System

# Use the unified eurostyle.sh script for all operations
chmod +x eurostyle.sh

# Show all available commands
./eurostyle.sh

# Start containers
./eurostyle.sh start

# Generate demo data with Universal Data Generator
./eurostyle.sh demo-fast    # Fast demo (~2K customers, 1K orders)
./eurostyle.sh demo-full    # Full demo (50K customers, 5K orders)

2. Alternative: Direct Universal Data Generation

# Generate data for all databases with perfect consistency
python3 scripts/data-generation/universal_data_generator_v2.py --all --mode demo

# Generate full-scale data  
python3 scripts/data-generation/universal_data_generator_v2.py --all --mode full

# Load the generated data
./scripts/data-loading/load_full_dataset.sh

3. Access the Multi-Database System

HTTP Interface: http://localhost:8124 (requires authentication)
Native Connection: localhost:9002
Databases:

  • eurostyle_operational (Core ERP)
  • eurostyle_finance (Financial GL)
  • eurostyle_hr (Human Resources)
  • eurostyle_webshop (Analytics)
  • eurostyle_pos (Point of Sales)

4. Verify Perfect Consistency

# Check system status and consistency
./eurostyle.sh status

# Validate Operations as Master consistency
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
SELECT 
  '🎯 Operations as Master Check' as test,
  sum(subtotal_eur) as total_operations_revenue,
  (SELECT sum(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%') as total_gl_revenue,
  round(abs(sum(subtotal_eur) - (SELECT sum(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%')) / sum(subtotal_eur) * 100, 1) as variance_percent,
  CASE WHEN abs(sum(subtotal_eur) - (SELECT sum(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%')) / sum(subtotal_eur) < 0.15
       THEN 'βœ… EXCELLENT MATCH (<15% variance)' ELSE '⚠️ VARIANCE DETECTED' END as status
FROM eurostyle_operational.orders"

5. Stop the System

# Stop containers (preserve data)
./eurostyle.sh stop

# Clean all data
./eurostyle.sh clean --force

πŸ”„ Incremental Data Generation - Complete Business Operations

Overview

Simulate realistic daily business operations with both new records and updates to existing data. Generate new orders, customer registrations, webshop sessions, plus operational changes like product price updates, employee promotions, and customer loyalty upgrades that maintain perfect consistency.

Key Features

  • βœ… Perfect Continuity: New data continues existing ID sequences (e.g., orders 005001, 005002...)
  • βœ… Referential Integrity: Uses existing customers, products, and stores
  • βœ… GL Consistency: Every new order generates matching finance entries
  • βœ… Realistic Business Updates: Price changes, employee promotions, loyalty upgrades
  • βœ… Automatic Loading: Data is automatically loaded into ClickHouse after generation
  • βœ… Business Logic: VAT calculation, shipping rules, store assignments, promotion logic

πŸ†• Enhanced Quick Examples (Unified Commands)

# Complete business day (new records + updates) - RECOMMENDED
./eurostyle.sh increment --days 1

# Multi-day business growth simulation with automatic loading
./eurostyle.sh increment --days 7 --intensity normal

# Black Friday simulation (high volume)
./eurostyle.sh increment --days 1 --intensity heavy

# Specific update operations only
./eurostyle.sh increment --types "customer_updates,product_updates" --days 1
./eurostyle.sh increment --types "employee_updates" --days 1

# New records only (classic incremental)
./eurostyle.sh increment --types "orders,customers,sessions" --days 3

πŸ”§ Advanced Generator Usage (Direct)

# Direct generator usage for advanced scenarios
python3 scripts/data-generation/universal_incremental_generator.py --days 1
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy
python3 scripts/data-generation/universal_incremental_generator.py --types "customer_updates" --days 1
python3 scripts/data-generation/universal_incremental_generator.py --update-only --types "employee_updates" --days 1

πŸ“Š Enhanced Daily Business Activity Volumes

Intensity New Records Update Operations Revenue/Day
Orders/Customers/Sessions Customer/Product/Employee Updates
Light 50/10/250 25/15/3 ~€20K
Normal 100/20/500 50/30/5 ~€40K
Heavy 200/40/1000 100/60/10 ~€80K

Update Operations Include:

  • Customer Updates: Address changes, loyalty upgrades, preference changes
  • Product Updates: Price adjustments, stock updates, seasonal changes, cost price updates
  • Employee Updates: Salary increases/promotions, status changes, visa updates

πŸ“₯ Automated Incremental Data Loading

βœ… AUTOMATIC (Recommended): Loading happens automatically when using ./eurostyle.sh increment

πŸ”§ MANUAL (Advanced): For manual control or troubleshooting

# Load all incremental data files (both new records and updates)
bash scripts/data-loading/load_incremental_data.sh

# Verify system status after loading
./eurostyle.sh status

πŸ‘€ What Gets Loaded Automatically:

  • *_incremental.csv.gz β†’ New records (orders, customers, sessions, GL entries)
  • *_updates.csv.gz β†’ Updates to existing records (customers, products, employees)
  • Complex table names supported (e.g., gl_journal_headers_incremental.csv.gz)
  • Automatic database status reporting after successful loading

Business Scenarios

πŸ“ˆ Growth Simulation

# Simulate 30 days of steady growth
for i in {1..30}; do
  python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity normal
  # Load data into ClickHouse here
done

πŸ›οΈ Seasonal Peak (Black Friday)

# Black Friday week with increasing intensity
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity normal  # Monday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity normal  # Tuesday  
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy   # Wednesday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy   # Thursday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy   # Black Friday

🏒 Organizational Changes

# Simulate quarterly HR activities
python3 scripts/data-generation/universal_incremental_generator.py --types "departments,employees" --days 1

πŸ“‹ Enhanced Data Types Available

Type Description Impact
NEW RECORDS
orders New customer orders Revenue, inventory, GL entries
customers New registrations Customer base growth, potential orders
sessions Webshop activity Customer journey, conversion tracking
departments HR reorganization Employee transfers, promotions, salary changes
UPDATE OPERATIONS
customer_updates Address/loyalty/preference changes Customer profile improvements
product_updates Price/stock/seasonal adjustments Market dynamics, inventory management
employee_updates Promotions/salary/status changes HR operations, payroll updates
cost_center_updates Budget/spending/manager changes Financial planning, org changes
COMBINED
all (default) Complete business day All new records + all update operations

Consistency Verification

# Verify incremental data maintains perfect consistency
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
WITH 
incremental_orders AS (
  SELECT COUNT(*) as orders, SUM(subtotal_eur) as revenue
  FROM eurostyle_operational.orders 
  WHERE order_id > 'ORD_EU_2024_005000'
),
incremental_gl AS (
  SELECT COUNT(*) as entries, SUM(credit_amount) as revenue
  FROM eurostyle_finance.gl_journal_lines 
  WHERE account_id LIKE '4%' AND journal_header_id > 'JH_ORD_EU_2024_005000'
)
SELECT 
  'Incremental Consistency' as check_type,
  i.orders, i.revenue as ops_revenue,
  g.entries, g.revenue as gl_revenue,
  CASE WHEN ABS(i.revenue - g.revenue) < 0.01 
       THEN 'βœ… PERFECT MATCH' 
       ELSE '❌ MISMATCH' END as status
FROM incremental_orders i, incremental_gl g"

# Verify ID sequence continuity
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
SELECT 
  'Latest Order ID' as check,
  MAX(order_id) as value
FROM eurostyle_operational.orders
UNION ALL
SELECT 
  'Latest Customer ID',
  MAX(customer_id) 
FROM eurostyle_operational.customers"

# Check total system revenue consistency
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
SELECT
  'Total System Check' as test_type,
  round((SELECT SUM(subtotal_eur) FROM eurostyle_operational.orders), 2) as operational_revenue,
  round((SELECT SUM(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%'), 2) as finance_revenue,
  CASE WHEN ABS((SELECT SUM(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%') - (SELECT SUM(subtotal_eur) FROM eurostyle_operational.orders)) < 1000 
       THEN 'βœ… EXCELLENT MATCH' 
       ELSE '❌ VARIANCE TOO HIGH' END as status"

βœ… Verified Test Results (October 2024)

The incremental data generation system has been thoroughly tested with the following verified results:

Base System After Full Load:

  • βœ… 50,000 customers (CUST_EU_000001 to CUST_EU_050000)
  • βœ… 5,000 orders (ORD_EU_2024_000001 to ORD_EU_2024_005000)
  • βœ… Perfect revenue consistency: €2,548,613.44 (Operations) vs €2,548,964.89 (Finance GL)
  • βœ… Variance: €351.45 (< €1000 threshold = Excellent)

After 3-Day Incremental Simulation:

  • βœ… Total orders: 5,100 (added 100 orders with perfect ID continuity)
  • βœ… Total customers: 50,020 (added 20 customers with perfect ID continuity)
  • βœ… Latest order ID: ORD_EU_2024_005100 βœ… Perfect sequence
  • βœ… Latest customer ID: CUST_EU_050020 βœ… Perfect sequence
  • βœ… Incremental revenue consistency: 100 orders with €45,576.46 = PERFECT MATCH in GL
  • βœ… Total system revenue: €2,594,189.90 (Operations) vs €2,594,541.35 (Finance)
  • βœ… Final variance: €351.45 (maintained excellent consistency)

Cross-Database Integrity:

  • βœ… GL Headers: 5,100 (matching order count)
  • βœ… GL Lines: 25,431 (finance entries)
  • βœ… Web Sessions: 26,500 (25,000 base + 1,500 incremental)
  • βœ… All foreign key relationships maintained
  • βœ… No orphaned records or broken references

πŸ“Š Business Data Patterns & Perfect Consistency

🎯 Universal Data Generator Features

  • Perfect Revenue Sync: Operations revenue = Finance GL revenue (guaranteed)
  • HR-Finance Integration: Employee salaries = Payroll GL entries (exact match)
  • POS Revenue Matching: POS transactions = Operations orders (exact match)
  • European VAT Compliance: Country-specific VAT rates (NL: 21%, DE: 19%, FR: 20%, BE: 21%)
  • Webshop Alignment: Analytics sessions correlate with actual orders
  • Cross-Database Integrity: All foreign keys maintained automatically

Geographic Distribution (Current Dataset)

  • Germany: 20 stores, major market presence
  • France: 15 stores, fashion-forward locations
  • Netherlands: 15 stores, high online adoption
  • Belgium: 8 stores, balanced online/offline

Realistic Business Metrics (Generated Data)

Current Dataset Scale:
  - Customers: 50,000 across 4 countries
  - Products: 2,500 fashion items
  - Orders: 5,000 with perfect GL matching
  - POS Transactions: 37,000+ with European VAT compliance
  - POS Transaction Items: 89,000+ line-level details
  - Employees: 830 with payroll integration
  - Web Sessions: 25,000 behavioral events
  
Consistency Guarantees:
  - Revenue Matching: €2,550,949.67 (Operations = Finance = POS)
  - GL Entries: 115,000+ double-entry bookkeeping
  - VAT Compliance: Country-specific rates applied accurately
  - Payroll Sync: HR salaries = Finance payroll expenses

Customer Behavior

  • Loyalty Program: 35% adoption
  • Marketing Opt-in: 42% (GDPR compliant)
  • Customer Segments: New (25%), Developing (35%), Established (25%), Loyal (15%)

πŸ” Real Queries and Results

Perfect Revenue Consistency: Operations vs Finance GL

Perfect revenue consistency

Operations as Master Validation: Operational order revenue exactly matching Finance GL revenue demonstrates perfect cross-database consistency.

SQL Query:

WITH 
  operations AS (SELECT round(sum(subtotal_eur), 2) AS revenue FROM eurostyle_operational.orders),
  finance AS (SELECT round(sum(credit_amount), 2) AS revenue FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%')
SELECT 
  'Perfect Consistency Demo' as validation_type,
  concat('€', toString(operations.revenue)) as operational_revenue,
  concat('€', toString(finance.revenue)) as finance_gl_revenue,
  CASE WHEN operations.revenue = finance.revenue 
       THEN 'βœ… EXCELLENT MATCH (<5% variance)' 
       ELSE '⚠️ VARIANCE DETECTED' END as consistency_status
FROM operations, finance;

Customer Value Intelligence Dashboard

Customer Value Intelligence

Customer Value Intelligence Dashboard: Advanced customer analysis combining purchase behavior, loyalty metrics, and customer segmentation with calculated value scores for targeted marketing and retention strategies.

SQL Query:

WITH customer_order_stats AS (
    SELECT 
        o.customer_id,
        COUNT(*) as actual_total_orders,
        SUM(o.subtotal_eur) as actual_total_spent,
        AVG(o.subtotal_eur) as actual_avg_order_value,
        MAX(o.order_date) as actual_last_order_date
    FROM eurostyle_operational.orders o
    GROUP BY o.customer_id
)
SELECT 
    c.customer_id,
    concat(c.first_name, ' ', c.last_name) AS customer_name,
    c.country_code AS customer_country,
    c.loyalty_tier,
    c.loyalty_points,
    COALESCE(cos.actual_total_orders, 0) AS orders_placed,
    ROUND(COALESCE(cos.actual_total_spent, 0), 2) AS lifetime_value_eur,
    ROUND(COALESCE(cos.actual_avg_order_value, 0), 2) AS avg_order_value_eur,
    CASE
        WHEN cos.actual_total_orders >= 3 AND cos.actual_total_spent >= 200 THEN 'VIP_Customer'
        WHEN cos.actual_total_orders >= 2 AND cos.actual_total_spent >= 100 THEN 'Loyal_Customer'
        WHEN cos.actual_total_orders >= 1 THEN 'Active_Customer'
        ELSE 'Browser_Only'
    END AS customer_segment,
    ROUND(
        (COALESCE(cos.actual_total_orders, 0) * 10 + 
         COALESCE(cos.actual_total_spent, 0) * 0.1 + 
         c.loyalty_points * 0.05 + 
         IF(c.marketing_opt_in, 15, 0) +
         IF(c.newsletter_subscription, 10, 0)) * 0.8, 1
    ) AS customer_value_score
FROM eurostyle_operational.customers c
LEFT JOIN customer_order_stats cos ON c.customer_id = cos.customer_id
WHERE c.customer_id IS NOT NULL
ORDER BY customer_value_score DESC, lifetime_value_eur DESC
LIMIT 20;

Customer Engagement Analysis by Device Type

Customer Sessions by Device

Device Engagement Analytics: Customer behavior analysis across desktop, mobile, and tablet devices showing session metrics, cart conversion rates, and engagement patterns.

SQL Query:

SELECT 
    ws.device_type,
    COUNT(DISTINCT ws.session_id) AS total_sessions,
    COUNT(DISTINCT ws.customer_id) AS unique_customers,
    ROUND(AVG(ws.session_duration_seconds) / 60.0, 2) AS avg_session_duration_minutes,
    AVG(ws.page_views) AS avg_pages_per_session,
    COUNT(DISTINCT ca.session_id) AS sessions_with_cart_activity,
    ROUND(COUNT(DISTINCT ca.session_id) * 100.0 / COUNT(DISTINCT ws.session_id), 1) AS cart_conversion_rate_pct,
    ROUND(SUM(CASE WHEN ws.conversion_session THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT ws.session_id), 1) AS purchase_conversion_rate_pct,
    ROUND(SUM(CASE WHEN ws.bounce_session THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT ws.session_id), 1) AS bounce_rate_pct
FROM eurostyle_webshop.web_sessions ws
LEFT JOIN eurostyle_webshop.cart_activities ca ON ws.session_id = ca.session_id
GROUP BY ws.device_type
ORDER BY total_sessions DESC;

Cross-Database Customer Journey Analytics

Customer Journey Analysis

Multi-Database Analytics: Customer engagement analysis combining webshop sessions with operational orders to calculate conversion rates by country.

SQL Query:

SELECT 
  c.country_code,
  count(DISTINCT c.customer_id) as total_customers,
  count(DISTINCT o.customer_id) as customers_with_orders,
  count(DISTINCT ws.customer_id) as customers_with_web_sessions,
  round(count(DISTINCT o.customer_id) / count(DISTINCT c.customer_id) * 100, 1) as order_conversion_rate,
  round(count(DISTINCT ws.customer_id) / count(DISTINCT c.customer_id) * 100, 1) as web_engagement_rate
FROM eurostyle_operational.customers c
LEFT JOIN eurostyle_operational.orders o ON c.customer_id = o.customer_id
LEFT JOIN eurostyle_webshop.web_sessions ws ON c.customer_id = ws.customer_id
GROUP BY c.country_code
ORDER BY total_customers DESC;

European VAT Compliance Analysis

Multi-Country VAT Analysis

VAT Compliance Dashboard: Point-of-sales VAT analysis across European countries showing transaction volumes, tax collections, and effective VAT rates by country.

SQL Query:

SELECT 
    s.country_code AS country,
    count(*) AS pos_transactions,
    round(sum(t.subtotal_eur), 0) AS subtotal_eur,
    round(sum(t.tax_amount_eur), 0) AS vat_collected_eur,
    concat(
        round(
            if(sum(t.subtotal_eur) != 0, sum(t.tax_amount_eur) / sum(t.subtotal_eur) * 100, 0),
            1
        ),
        '%'
    ) AS effective_vat_rate
FROM eurostyle_pos.transactions t
JOIN eurostyle_operational.stores s ON t.store_id = s.store_id
GROUP BY s.country_code
ORDER BY sum(t.tax_amount_eur) DESC;

πŸ’‘ Note: Confirm exact column names using docs/SCHEMA.md before running queries as schemas may evolve.

🎯 Use Cases

This system is perfect for demonstrating:

Data Engineering

  • Multi-source ingestion: European retail data patterns
  • dbt transformations: Fashion industry KPIs
  • Data quality: European market compliance
  • Performance optimization: ClickHouse query patterns

Analytics & BI

  • Cross-country analysis: Market performance comparison
  • Seasonal forecasting: Fashion industry trends
  • Customer segmentation: European shopping behavior
  • Supply chain optimization: Multi-DC inventory management

Data Governance

  • GDPR compliance: European customer data handling
  • Multi-currency: EUR-based with regional variations
  • Localization: Multi-language product catalogs
  • Data lineage: Source-to-analytics traceability

πŸ”§ Configuration

Universal Data Generator Configuration

Edit config/environments/development.yaml:

data_paths:
  csv_output: 'data/csv'     # Output directory for generated CSV files

compression:
  enabled: true              # Enable gzip compression
  extension: '.gz'           # File extension for compressed files

# Data generation scales
modes:
  demo: 
    customers: 200           # Small demo dataset
    orders: 100             
  fast:
    customers: 1000          # Medium dataset for development
    orders: 500
  full:
    customers: 50000         # Full-scale production-like dataset
    orders: 5000

Regional Customization

geographic_distribution:
  Germany:
    percentage: 35
    vat_rate: 19.0
    major_cities: ["Berlin", "Hamburg", "Munich"]
  # ... customize other countries

πŸ“ˆ Data Validation

Built-in Consistency Validation

The Universal Data Generator includes comprehensive validation:

# Validate perfect consistency across all databases
python3 scripts/data-generation/universal_data_generator_v2.py --validate-consistency

# Check system health and consistency
./scripts/utilities/system_status.sh --validate

# Run cross-system validation
python3 scripts/utilities/validate_cross_system_consistency.py

Key Consistency Guarantees Verified

  • βœ… Perfect Revenue Matching: Operations revenue = Finance GL revenue (exact match)
  • βœ… HR-Finance Sync: Employee salaries = Payroll GL entries (exact match)
  • βœ… Cross-Database Integrity: All foreign keys maintained automatically
  • βœ… Realistic Behavioral Data: Webshop sessions correlate with actual orders
  • βœ… GDPR Compliance: Personal data handling across all systems

πŸ› Troubleshooting

Container Issues

# Check container status
docker ps -a | grep eurostyle

# View logs
docker logs eurostyle_clickhouse_retail

# Restart containers
./scripts/start-eurostyle.sh --recreate

Database Issues

# Test connection
docker exec eurostyle_clickhouse_retail clickhouse-client --query="SELECT 1"

# Check database
docker exec eurostyle_clickhouse_retail clickhouse-client --query="SHOW DATABASES"

# View table counts
docker exec eurostyle_clickhouse_retail clickhouse-client --database=eurostyle_operational --query="
  SELECT table, formatReadableQuantity(total_rows) as rows 
  FROM system.tables 
  WHERE database = 'eurostyle_operational'
  ORDER BY total_rows DESC"

Universal Data Generation Issues

# Check Python dependencies for Universal Data Generator
pip3 install faker PyYAML

# Generate with verbose logging
python3 scripts/data-generation/universal_data_generator_v2.py --all --mode demo --verbose

# Test specific database generation
python3 scripts/data-generation/universal_data_generator_v2.py --database operational --mode demo

# Load data manually if needed
./scripts/data-loading/load_full_dataset.sh

πŸ“š Documentation

πŸ“‹ Documentation Validation

The project includes automated documentation validation tools (WARP.md Rule #23):

# Validate documentation consistency and broken links
./scripts/utilities/validate_documentation.sh

# Test that command examples in docs actually work
./scripts/utilities/test_documentation_commands.sh

# Run with verbose output for detailed analysis
./scripts/utilities/validate_documentation.sh --verbose

🀝 Integration Examples

dbt Source Configuration

sources:
  - name: eurostyle_source
    database: eurostyle_operational
    host: localhost
    port: 9002
    tables:
      - name: customers
      - name: products
      - name: orders
      - name: order_lines

Superset Connection

SQLAlchemy URI: clickhouse://eurostyle_user:eurostyle_demo_2024@localhost:8124/eurostyle_operational

🎨 Sample Queries

Multi-Database Business Intelligence Queries

-- Perfect Revenue Consistency Check (Operations vs Finance)
SELECT 
    'Revenue Validation' as check_type,
    operational.revenue as operational_eur,
    finance.revenue as finance_gl_eur,
    CASE WHEN operational.revenue = finance.revenue 
         THEN 'βœ… PERFECT MATCH' ELSE '❌ MISMATCH' END as status
FROM 
    (SELECT sum(subtotal_eur) as revenue FROM eurostyle_operational.orders) operational,
    (SELECT sum(credit_amount) as revenue FROM eurostyle_finance.gl_journal_lines WHERE account_id = '4000') finance;

-- Monthly revenue by country with GL validation
SELECT 
    c.country_code,
    toYYYYMM(toDate(o.order_date)) as month,
    sum(o.total_amount_eur) as revenue_eur,
    count(*) as order_count,
    avg(o.total_amount_eur) as avg_order_value
FROM eurostyle_operational.orders o
JOIN eurostyle_operational.customers c ON o.customer_id = c.customer_id
GROUP BY c.country_code, month
ORDER BY month DESC, revenue_eur DESC;

-- Cross-system customer journey analysis
SELECT 
    c.country_code,
    count(DISTINCT c.customer_id) as total_customers,
    count(DISTINCT o.customer_id) as customers_with_orders,
    count(DISTINCT ws.customer_id) as customers_with_sessions,
    round(count(DISTINCT o.customer_id) / count(DISTINCT c.customer_id) * 100, 2) as conversion_rate
FROM eurostyle_operational.customers c
LEFT JOIN eurostyle_operational.orders o ON c.customer_id = o.customer_id
LEFT JOIN eurostyle_webshop.web_sessions ws ON c.customer_id = ws.customer_id
GROUP BY c.country_code
ORDER BY total_customers DESC;

-- HR-Finance payroll consistency check
SELECT 
    'Payroll Validation' as check_type,
    sum(e.annual_salary_eur) as hr_total_salaries,
    sum(gl.debit_amount) as finance_payroll_expenses,
    'Monthly vs Annual' as note
FROM eurostyle_hr.employees e,
     (SELECT sum(debit_amount) as debit_amount FROM eurostyle_finance.gl_journal_lines WHERE account_id = '6100') gl;

-- POS-Operations revenue consistency check
SELECT 
    'POS Revenue Validation' as check_type,
    round(sum(pos.total_amount_eur), 2) as pos_total_revenue,
    round((SELECT sum(total_amount_eur) FROM eurostyle_operational.orders), 2) as operations_revenue,
    CASE WHEN abs(sum(pos.total_amount_eur) - (SELECT sum(total_amount_eur) FROM eurostyle_operational.orders)) < 0.01
         THEN 'βœ… PERFECT MATCH' 
         ELSE '❌ MISMATCH' END as status
FROM eurostyle_pos.transactions pos;

-- European VAT analysis by country
SELECT 
    s.country_code,
    count(*) as transaction_count,
    sum(t.subtotal_amount_eur) as subtotal_eur,
    sum(t.tax_amount_eur) as vat_collected_eur,
    round(sum(t.tax_amount_eur) / sum(t.subtotal_amount_eur) * 100, 2) as avg_vat_rate_percent
FROM eurostyle_pos.transactions t
JOIN eurostyle_operational.stores s ON t.store_id = s.store_id
GROUP BY s.country_code
ORDER BY vat_collected_eur DESC;

πŸ“ License

This demo system is provided for educational and testing purposes. The EuroStyle Fashion business scenario is fictional and designed specifically for data platform demonstrations.



⭐ Star This Repository

Found this useful? Please star this repository to show your support and help others discover this demo platform!

🀝 Contributing

We welcome contributions! Whether it's:

  • πŸ› Bug reports and fixes
  • ✨ New features and business scenarios
  • πŸ“š Documentation improvements
  • πŸ”§ Integration examples (dbt, Superset, etc.)

See CONTRIBUTING.md for detailed guidelines.

πŸ‘¨β€πŸ’» Contributors

Thanks to everyone who has contributed to making this demo platform better!

Want to be featured here? Check out our contribution guide!


πŸͺ EuroStyle Fashion - European retail data, reimagined for the modern data stack

πŸ—œοΈ Data Compression

The CSV data files are stored in compressed format (gzip) to reduce repository size:

Decompress Data Files

# Decompress all data files
./scripts/decompress_data.sh --all

# Decompress specific modules
./scripts/decompress_data.sh --finance
./scripts/decompress_data.sh --hr
./scripts/decompress_data.sh --webshop

Recompress Data Files

./scripts/compress_data.sh

Size Comparison:

  • Uncompressed: ~100MB
  • Compressed: ~14MB (86% reduction)

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •