A comprehensive European fashion retail demo system with perfect cross-database consistency, designed for testing and demonstrating modern data platform capabilities.
Data Consistency - Operations revenue = Finance GL = POS transactions
European Market Focused - GDPR compliant, multi-currency, VAT handling
Realistic Records - Across 5 integrated databases with business intelligence ready data
Incremental Updates - Simulate daily business operations with new orders AND operational changes
Docker Ready - One-command deployment with isolated ports
Business Intelligence Ready - Pre-built queries for analytics and reporting
π‘ Perfect for: Data engineering demos, BI tool testing, ClickHouse learning, analytics platform evaluation
5-Database Architecture: Unified ClickHouse instance demonstrating Operations as Master architecture with cross-database analytics capability. All databases accessible via single connection (localhost:8124) with perfect referential integrity.
π Explore More: Complete database schemas | Visual documentation
# Clone and start the complete demo system
git clone https://github.com/Astoviq/demo_source_data_astoviq.git
cd demo_source_data_astoviq
chmod +x eurostyle.sh
# Start containers and generate demo data
./eurostyle.sh start
./eurostyle.sh demo-fast # ~2K customers, 1K orders
# Access the system
open http://localhost:8124 # ClickHouse interface
./eurostyle.sh status # Check system healthβ You now have: 5 databases, 78K+ records, perfect data consistency, ready for BI tools!
EuroStyle Fashion is a realistic Pan-European fashion retailer with:
- Geographic Coverage: Netherlands π³π±, Belgium π§πͺ, Germany π©πͺ, France π«π·
- Business Model: Fast-fashion with sustainability focus
- Multi-Database Architecture: 5 integrated databases with guaranteed consistency
- Data Volume: 78K+ total records across 5 databases with perfect GL matching and cross-system consistency
- Realistic Patterns: Seasonal trends, geographic distribution, European market behavior, VAT compliance
- π― NEW: Universal Data Generator with POS integration ensuring perfect revenue consistency
The system implements Operations as Master architecture where all sales channels flow through a central operations database, then to financial systems - reflecting real enterprise patterns:
ποΈ Operations as Master Data Flow:
βββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββ
β Webshop βββββΆβ Operations DB βββββΆβ General Ledger β
β (Online) β β β β (Finance) β
βββββββββββββββ β All Orders β βββββββββββββββββββ
β βββ Online (335) β
βββββββββββββββ β βββ In-store (1915) β
β POS Stores βββββΆβ β
β (Physical) β βββββββββββββββββββββββ
βββββββββββββββ
The system uses ClickHouse with 5 integrated databases following enterprise data architecture:
eurostyle_operational: # Operations as Master - Central System (31,050+ records)
- customers # 1,000+ European customers with GDPR compliance
- products # 500+ fashion items with sustainability metrics
- stores # 35 physical locations across 4 countries
- orders # 2,250+ orders (ALL channels: online + in-store)
- online # 335 orders (β¬172,610 revenue, β¬515 avg)
- in-store # 1,915 orders (β¬389,963 revenue, β¬204 avg)
- order_lines # 2,774+ order items with return tracking
- inventory # 27,939+ real-time stock levels across all locations
eurostyle_finance: # Financial Management System (13,718 records)
- legal_entities # 5 European business entities
- gl_journal_headers # 600+ journal headers matching orders
- gl_journal_lines # 10,836+ GL entries with perfect revenue sync
- chart_of_accounts # 26 IFRS-compliant account structure
- cost_centers # 15 cost center management
- fixed_assets # 100+ asset management with depreciation
eurostyle_hr: # Human Resources System (11,163 records)
- employees # 320+ employees with payroll GL integration
- departments # 5 organizational departments
- performance_reviews # 1,828+ performance reviews and cycles
- training_programs # 108+ training programs with 1,863+ records
- survey_responses # 4,510+ employee engagement responses
eurostyle_webshop: # Digital Analytics System (20,279 records)
- web_sessions # 3,000+ customer journey sessions
- web_analytics_events # 7,500+ behavioral tracking events
- product_reviews # 500+ customer reviews and ratings
- search_queries # 2,400+ search behavior analysis
- cart_activities # 2,000+ shopping cart behavior
- product_recommendations # 3,000+ AI-powered recommendations
eurostyle_pos: # Point of Sales Analytics (7,351 records)
- transactions # 1,750+ POS detail records (analytics & payments)
- transaction_items # 1,750+ line-level transaction details
- payments # 1,826+ payment method tracking
- employee_shifts # 1,675+ HR-POS staff integration
- employee_assignments # 70+ store staff assignments
# Note: All POS sales flow through Operations orders (in-store channel)- Container Name:
eurostyle_clickhouse_retail - HTTP Port:
8124(isolated from production) - Native Port:
9002(isolated from production) - Database:
eurostyle_operational - Memory Limit: 2GB (demo-optimized)
See the actual ClickHouse interface with production-scale data volumes:
π Point of Sales Database
Point of Sales system with 32K transactions, European VAT compliance, and integrated employee performance tracking.
π Key Features: Production-scale volumes, perfect cross-database consistency, realistic business relationships, and European market compliance.
- Docker and Docker Compose
- Python 3.9+ (for data generation)
- 4GB available RAM
- 5GB disk space
# Use the unified eurostyle.sh script for all operations
chmod +x eurostyle.sh
# Show all available commands
./eurostyle.sh
# Start containers
./eurostyle.sh start
# Generate demo data with Universal Data Generator
./eurostyle.sh demo-fast # Fast demo (~2K customers, 1K orders)
./eurostyle.sh demo-full # Full demo (50K customers, 5K orders)# Generate data for all databases with perfect consistency
python3 scripts/data-generation/universal_data_generator_v2.py --all --mode demo
# Generate full-scale data
python3 scripts/data-generation/universal_data_generator_v2.py --all --mode full
# Load the generated data
./scripts/data-loading/load_full_dataset.shHTTP Interface: http://localhost:8124 (requires authentication)
Native Connection: localhost:9002
Databases:
eurostyle_operational(Core ERP)eurostyle_finance(Financial GL)eurostyle_hr(Human Resources)eurostyle_webshop(Analytics)eurostyle_pos(Point of Sales)
# Check system status and consistency
./eurostyle.sh status
# Validate Operations as Master consistency
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
SELECT
'π― Operations as Master Check' as test,
sum(subtotal_eur) as total_operations_revenue,
(SELECT sum(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%') as total_gl_revenue,
round(abs(sum(subtotal_eur) - (SELECT sum(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%')) / sum(subtotal_eur) * 100, 1) as variance_percent,
CASE WHEN abs(sum(subtotal_eur) - (SELECT sum(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%')) / sum(subtotal_eur) < 0.15
THEN 'β
EXCELLENT MATCH (<15% variance)' ELSE 'β οΈ VARIANCE DETECTED' END as status
FROM eurostyle_operational.orders"# Stop containers (preserve data)
./eurostyle.sh stop
# Clean all data
./eurostyle.sh clean --forceSimulate realistic daily business operations with both new records and updates to existing data. Generate new orders, customer registrations, webshop sessions, plus operational changes like product price updates, employee promotions, and customer loyalty upgrades that maintain perfect consistency.
- β Perfect Continuity: New data continues existing ID sequences (e.g., orders 005001, 005002...)
- β Referential Integrity: Uses existing customers, products, and stores
- β GL Consistency: Every new order generates matching finance entries
- β Realistic Business Updates: Price changes, employee promotions, loyalty upgrades
- β Automatic Loading: Data is automatically loaded into ClickHouse after generation
- β Business Logic: VAT calculation, shipping rules, store assignments, promotion logic
# Complete business day (new records + updates) - RECOMMENDED
./eurostyle.sh increment --days 1
# Multi-day business growth simulation with automatic loading
./eurostyle.sh increment --days 7 --intensity normal
# Black Friday simulation (high volume)
./eurostyle.sh increment --days 1 --intensity heavy
# Specific update operations only
./eurostyle.sh increment --types "customer_updates,product_updates" --days 1
./eurostyle.sh increment --types "employee_updates" --days 1
# New records only (classic incremental)
./eurostyle.sh increment --types "orders,customers,sessions" --days 3# Direct generator usage for advanced scenarios
python3 scripts/data-generation/universal_incremental_generator.py --days 1
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy
python3 scripts/data-generation/universal_incremental_generator.py --types "customer_updates" --days 1
python3 scripts/data-generation/universal_incremental_generator.py --update-only --types "employee_updates" --days 1| Intensity | New Records | Update Operations | Revenue/Day |
|---|---|---|---|
| Orders/Customers/Sessions | Customer/Product/Employee Updates | ||
| Light | 50/10/250 | 25/15/3 | ~β¬20K |
| Normal | 100/20/500 | 50/30/5 | ~β¬40K |
| Heavy | 200/40/1000 | 100/60/10 | ~β¬80K |
Update Operations Include:
- Customer Updates: Address changes, loyalty upgrades, preference changes
- Product Updates: Price adjustments, stock updates, seasonal changes, cost price updates
- Employee Updates: Salary increases/promotions, status changes, visa updates
β
AUTOMATIC (Recommended): Loading happens automatically when using ./eurostyle.sh increment
π§ MANUAL (Advanced): For manual control or troubleshooting
# Load all incremental data files (both new records and updates)
bash scripts/data-loading/load_incremental_data.sh
# Verify system status after loading
./eurostyle.sh statusπ What Gets Loaded Automatically:
*_incremental.csv.gzβ New records (orders, customers, sessions, GL entries)*_updates.csv.gzβ Updates to existing records (customers, products, employees)- Complex table names supported (e.g.,
gl_journal_headers_incremental.csv.gz) - Automatic database status reporting after successful loading
π Growth Simulation
# Simulate 30 days of steady growth
for i in {1..30}; do
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity normal
# Load data into ClickHouse here
doneποΈ Seasonal Peak (Black Friday)
# Black Friday week with increasing intensity
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity normal # Monday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity normal # Tuesday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy # Wednesday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy # Thursday
python3 scripts/data-generation/universal_incremental_generator.py --days 1 --intensity heavy # Black Fridayπ’ Organizational Changes
# Simulate quarterly HR activities
python3 scripts/data-generation/universal_incremental_generator.py --types "departments,employees" --days 1| Type | Description | Impact |
|---|---|---|
| NEW RECORDS | ||
| orders | New customer orders | Revenue, inventory, GL entries |
| customers | New registrations | Customer base growth, potential orders |
| sessions | Webshop activity | Customer journey, conversion tracking |
| departments | HR reorganization | Employee transfers, promotions, salary changes |
| UPDATE OPERATIONS | ||
| customer_updates | Address/loyalty/preference changes | Customer profile improvements |
| product_updates | Price/stock/seasonal adjustments | Market dynamics, inventory management |
| employee_updates | Promotions/salary/status changes | HR operations, payroll updates |
| cost_center_updates | Budget/spending/manager changes | Financial planning, org changes |
| COMBINED | ||
| all (default) | Complete business day | All new records + all update operations |
# Verify incremental data maintains perfect consistency
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
WITH
incremental_orders AS (
SELECT COUNT(*) as orders, SUM(subtotal_eur) as revenue
FROM eurostyle_operational.orders
WHERE order_id > 'ORD_EU_2024_005000'
),
incremental_gl AS (
SELECT COUNT(*) as entries, SUM(credit_amount) as revenue
FROM eurostyle_finance.gl_journal_lines
WHERE account_id LIKE '4%' AND journal_header_id > 'JH_ORD_EU_2024_005000'
)
SELECT
'Incremental Consistency' as check_type,
i.orders, i.revenue as ops_revenue,
g.entries, g.revenue as gl_revenue,
CASE WHEN ABS(i.revenue - g.revenue) < 0.01
THEN 'β
PERFECT MATCH'
ELSE 'β MISMATCH' END as status
FROM incremental_orders i, incremental_gl g"
# Verify ID sequence continuity
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
SELECT
'Latest Order ID' as check,
MAX(order_id) as value
FROM eurostyle_operational.orders
UNION ALL
SELECT
'Latest Customer ID',
MAX(customer_id)
FROM eurostyle_operational.customers"
# Check total system revenue consistency
docker exec eurostyle_clickhouse_retail clickhouse-client --query="
SELECT
'Total System Check' as test_type,
round((SELECT SUM(subtotal_eur) FROM eurostyle_operational.orders), 2) as operational_revenue,
round((SELECT SUM(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%'), 2) as finance_revenue,
CASE WHEN ABS((SELECT SUM(credit_amount) FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%') - (SELECT SUM(subtotal_eur) FROM eurostyle_operational.orders)) < 1000
THEN 'β
EXCELLENT MATCH'
ELSE 'β VARIANCE TOO HIGH' END as status"The incremental data generation system has been thoroughly tested with the following verified results:
Base System After Full Load:
- β 50,000 customers (CUST_EU_000001 to CUST_EU_050000)
- β 5,000 orders (ORD_EU_2024_000001 to ORD_EU_2024_005000)
- β Perfect revenue consistency: β¬2,548,613.44 (Operations) vs β¬2,548,964.89 (Finance GL)
- β Variance: β¬351.45 (< β¬1000 threshold = Excellent)
After 3-Day Incremental Simulation:
- β Total orders: 5,100 (added 100 orders with perfect ID continuity)
- β Total customers: 50,020 (added 20 customers with perfect ID continuity)
- β Latest order ID: ORD_EU_2024_005100 β Perfect sequence
- β Latest customer ID: CUST_EU_050020 β Perfect sequence
- β Incremental revenue consistency: 100 orders with β¬45,576.46 = PERFECT MATCH in GL
- β Total system revenue: β¬2,594,189.90 (Operations) vs β¬2,594,541.35 (Finance)
- β Final variance: β¬351.45 (maintained excellent consistency)
Cross-Database Integrity:
- β GL Headers: 5,100 (matching order count)
- β GL Lines: 25,431 (finance entries)
- β Web Sessions: 26,500 (25,000 base + 1,500 incremental)
- β All foreign key relationships maintained
- β No orphaned records or broken references
- Perfect Revenue Sync: Operations revenue = Finance GL revenue (guaranteed)
- HR-Finance Integration: Employee salaries = Payroll GL entries (exact match)
- POS Revenue Matching: POS transactions = Operations orders (exact match)
- European VAT Compliance: Country-specific VAT rates (NL: 21%, DE: 19%, FR: 20%, BE: 21%)
- Webshop Alignment: Analytics sessions correlate with actual orders
- Cross-Database Integrity: All foreign keys maintained automatically
- Germany: 20 stores, major market presence
- France: 15 stores, fashion-forward locations
- Netherlands: 15 stores, high online adoption
- Belgium: 8 stores, balanced online/offline
Current Dataset Scale:
- Customers: 50,000 across 4 countries
- Products: 2,500 fashion items
- Orders: 5,000 with perfect GL matching
- POS Transactions: 37,000+ with European VAT compliance
- POS Transaction Items: 89,000+ line-level details
- Employees: 830 with payroll integration
- Web Sessions: 25,000 behavioral events
Consistency Guarantees:
- Revenue Matching: β¬2,550,949.67 (Operations = Finance = POS)
- GL Entries: 115,000+ double-entry bookkeeping
- VAT Compliance: Country-specific rates applied accurately
- Payroll Sync: HR salaries = Finance payroll expenses- Loyalty Program: 35% adoption
- Marketing Opt-in: 42% (GDPR compliant)
- Customer Segments: New (25%), Developing (35%), Established (25%), Loyal (15%)
Operations as Master Validation: Operational order revenue exactly matching Finance GL revenue demonstrates perfect cross-database consistency.
SQL Query:
WITH
operations AS (SELECT round(sum(subtotal_eur), 2) AS revenue FROM eurostyle_operational.orders),
finance AS (SELECT round(sum(credit_amount), 2) AS revenue FROM eurostyle_finance.gl_journal_lines WHERE account_id LIKE '4%')
SELECT
'Perfect Consistency Demo' as validation_type,
concat('β¬', toString(operations.revenue)) as operational_revenue,
concat('β¬', toString(finance.revenue)) as finance_gl_revenue,
CASE WHEN operations.revenue = finance.revenue
THEN 'β
EXCELLENT MATCH (<5% variance)'
ELSE 'β οΈ VARIANCE DETECTED' END as consistency_status
FROM operations, finance;Customer Value Intelligence Dashboard: Advanced customer analysis combining purchase behavior, loyalty metrics, and customer segmentation with calculated value scores for targeted marketing and retention strategies.
SQL Query:
WITH customer_order_stats AS (
SELECT
o.customer_id,
COUNT(*) as actual_total_orders,
SUM(o.subtotal_eur) as actual_total_spent,
AVG(o.subtotal_eur) as actual_avg_order_value,
MAX(o.order_date) as actual_last_order_date
FROM eurostyle_operational.orders o
GROUP BY o.customer_id
)
SELECT
c.customer_id,
concat(c.first_name, ' ', c.last_name) AS customer_name,
c.country_code AS customer_country,
c.loyalty_tier,
c.loyalty_points,
COALESCE(cos.actual_total_orders, 0) AS orders_placed,
ROUND(COALESCE(cos.actual_total_spent, 0), 2) AS lifetime_value_eur,
ROUND(COALESCE(cos.actual_avg_order_value, 0), 2) AS avg_order_value_eur,
CASE
WHEN cos.actual_total_orders >= 3 AND cos.actual_total_spent >= 200 THEN 'VIP_Customer'
WHEN cos.actual_total_orders >= 2 AND cos.actual_total_spent >= 100 THEN 'Loyal_Customer'
WHEN cos.actual_total_orders >= 1 THEN 'Active_Customer'
ELSE 'Browser_Only'
END AS customer_segment,
ROUND(
(COALESCE(cos.actual_total_orders, 0) * 10 +
COALESCE(cos.actual_total_spent, 0) * 0.1 +
c.loyalty_points * 0.05 +
IF(c.marketing_opt_in, 15, 0) +
IF(c.newsletter_subscription, 10, 0)) * 0.8, 1
) AS customer_value_score
FROM eurostyle_operational.customers c
LEFT JOIN customer_order_stats cos ON c.customer_id = cos.customer_id
WHERE c.customer_id IS NOT NULL
ORDER BY customer_value_score DESC, lifetime_value_eur DESC
LIMIT 20;Device Engagement Analytics: Customer behavior analysis across desktop, mobile, and tablet devices showing session metrics, cart conversion rates, and engagement patterns.
SQL Query:
SELECT
ws.device_type,
COUNT(DISTINCT ws.session_id) AS total_sessions,
COUNT(DISTINCT ws.customer_id) AS unique_customers,
ROUND(AVG(ws.session_duration_seconds) / 60.0, 2) AS avg_session_duration_minutes,
AVG(ws.page_views) AS avg_pages_per_session,
COUNT(DISTINCT ca.session_id) AS sessions_with_cart_activity,
ROUND(COUNT(DISTINCT ca.session_id) * 100.0 / COUNT(DISTINCT ws.session_id), 1) AS cart_conversion_rate_pct,
ROUND(SUM(CASE WHEN ws.conversion_session THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT ws.session_id), 1) AS purchase_conversion_rate_pct,
ROUND(SUM(CASE WHEN ws.bounce_session THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT ws.session_id), 1) AS bounce_rate_pct
FROM eurostyle_webshop.web_sessions ws
LEFT JOIN eurostyle_webshop.cart_activities ca ON ws.session_id = ca.session_id
GROUP BY ws.device_type
ORDER BY total_sessions DESC;Multi-Database Analytics: Customer engagement analysis combining webshop sessions with operational orders to calculate conversion rates by country.
SQL Query:
SELECT
c.country_code,
count(DISTINCT c.customer_id) as total_customers,
count(DISTINCT o.customer_id) as customers_with_orders,
count(DISTINCT ws.customer_id) as customers_with_web_sessions,
round(count(DISTINCT o.customer_id) / count(DISTINCT c.customer_id) * 100, 1) as order_conversion_rate,
round(count(DISTINCT ws.customer_id) / count(DISTINCT c.customer_id) * 100, 1) as web_engagement_rate
FROM eurostyle_operational.customers c
LEFT JOIN eurostyle_operational.orders o ON c.customer_id = o.customer_id
LEFT JOIN eurostyle_webshop.web_sessions ws ON c.customer_id = ws.customer_id
GROUP BY c.country_code
ORDER BY total_customers DESC;VAT Compliance Dashboard: Point-of-sales VAT analysis across European countries showing transaction volumes, tax collections, and effective VAT rates by country.
SQL Query:
SELECT
s.country_code AS country,
count(*) AS pos_transactions,
round(sum(t.subtotal_eur), 0) AS subtotal_eur,
round(sum(t.tax_amount_eur), 0) AS vat_collected_eur,
concat(
round(
if(sum(t.subtotal_eur) != 0, sum(t.tax_amount_eur) / sum(t.subtotal_eur) * 100, 0),
1
),
'%'
) AS effective_vat_rate
FROM eurostyle_pos.transactions t
JOIN eurostyle_operational.stores s ON t.store_id = s.store_id
GROUP BY s.country_code
ORDER BY sum(t.tax_amount_eur) DESC;π‘ Note: Confirm exact column names using docs/SCHEMA.md before running queries as schemas may evolve.
This system is perfect for demonstrating:
- Multi-source ingestion: European retail data patterns
- dbt transformations: Fashion industry KPIs
- Data quality: European market compliance
- Performance optimization: ClickHouse query patterns
- Cross-country analysis: Market performance comparison
- Seasonal forecasting: Fashion industry trends
- Customer segmentation: European shopping behavior
- Supply chain optimization: Multi-DC inventory management
- GDPR compliance: European customer data handling
- Multi-currency: EUR-based with regional variations
- Localization: Multi-language product catalogs
- Data lineage: Source-to-analytics traceability
Edit config/environments/development.yaml:
data_paths:
csv_output: 'data/csv' # Output directory for generated CSV files
compression:
enabled: true # Enable gzip compression
extension: '.gz' # File extension for compressed files
# Data generation scales
modes:
demo:
customers: 200 # Small demo dataset
orders: 100
fast:
customers: 1000 # Medium dataset for development
orders: 500
full:
customers: 50000 # Full-scale production-like dataset
orders: 5000geographic_distribution:
Germany:
percentage: 35
vat_rate: 19.0
major_cities: ["Berlin", "Hamburg", "Munich"]
# ... customize other countriesThe Universal Data Generator includes comprehensive validation:
# Validate perfect consistency across all databases
python3 scripts/data-generation/universal_data_generator_v2.py --validate-consistency
# Check system health and consistency
./scripts/utilities/system_status.sh --validate
# Run cross-system validation
python3 scripts/utilities/validate_cross_system_consistency.py- β Perfect Revenue Matching: Operations revenue = Finance GL revenue (exact match)
- β HR-Finance Sync: Employee salaries = Payroll GL entries (exact match)
- β Cross-Database Integrity: All foreign keys maintained automatically
- β Realistic Behavioral Data: Webshop sessions correlate with actual orders
- β GDPR Compliance: Personal data handling across all systems
# Check container status
docker ps -a | grep eurostyle
# View logs
docker logs eurostyle_clickhouse_retail
# Restart containers
./scripts/start-eurostyle.sh --recreate# Test connection
docker exec eurostyle_clickhouse_retail clickhouse-client --query="SELECT 1"
# Check database
docker exec eurostyle_clickhouse_retail clickhouse-client --query="SHOW DATABASES"
# View table counts
docker exec eurostyle_clickhouse_retail clickhouse-client --database=eurostyle_operational --query="
SELECT table, formatReadableQuantity(total_rows) as rows
FROM system.tables
WHERE database = 'eurostyle_operational'
ORDER BY total_rows DESC"# Check Python dependencies for Universal Data Generator
pip3 install faker PyYAML
# Generate with verbose logging
python3 scripts/data-generation/universal_data_generator_v2.py --all --mode demo --verbose
# Test specific database generation
python3 scripts/data-generation/universal_data_generator_v2.py --database operational --mode demo
# Load data manually if needed
./scripts/data-loading/load_full_dataset.sh- Business Schema: Detailed data model
- Data Generation Guide: Customizing demo data
- Integration Guide: Connecting to analytics platforms
- API Reference: Query examples and patterns
The project includes automated documentation validation tools (WARP.md Rule #23):
# Validate documentation consistency and broken links
./scripts/utilities/validate_documentation.sh
# Test that command examples in docs actually work
./scripts/utilities/test_documentation_commands.sh
# Run with verbose output for detailed analysis
./scripts/utilities/validate_documentation.sh --verbosesources:
- name: eurostyle_source
database: eurostyle_operational
host: localhost
port: 9002
tables:
- name: customers
- name: products
- name: orders
- name: order_linesSQLAlchemy URI: clickhouse://eurostyle_user:eurostyle_demo_2024@localhost:8124/eurostyle_operational-- Perfect Revenue Consistency Check (Operations vs Finance)
SELECT
'Revenue Validation' as check_type,
operational.revenue as operational_eur,
finance.revenue as finance_gl_eur,
CASE WHEN operational.revenue = finance.revenue
THEN 'β
PERFECT MATCH' ELSE 'β MISMATCH' END as status
FROM
(SELECT sum(subtotal_eur) as revenue FROM eurostyle_operational.orders) operational,
(SELECT sum(credit_amount) as revenue FROM eurostyle_finance.gl_journal_lines WHERE account_id = '4000') finance;
-- Monthly revenue by country with GL validation
SELECT
c.country_code,
toYYYYMM(toDate(o.order_date)) as month,
sum(o.total_amount_eur) as revenue_eur,
count(*) as order_count,
avg(o.total_amount_eur) as avg_order_value
FROM eurostyle_operational.orders o
JOIN eurostyle_operational.customers c ON o.customer_id = c.customer_id
GROUP BY c.country_code, month
ORDER BY month DESC, revenue_eur DESC;
-- Cross-system customer journey analysis
SELECT
c.country_code,
count(DISTINCT c.customer_id) as total_customers,
count(DISTINCT o.customer_id) as customers_with_orders,
count(DISTINCT ws.customer_id) as customers_with_sessions,
round(count(DISTINCT o.customer_id) / count(DISTINCT c.customer_id) * 100, 2) as conversion_rate
FROM eurostyle_operational.customers c
LEFT JOIN eurostyle_operational.orders o ON c.customer_id = o.customer_id
LEFT JOIN eurostyle_webshop.web_sessions ws ON c.customer_id = ws.customer_id
GROUP BY c.country_code
ORDER BY total_customers DESC;
-- HR-Finance payroll consistency check
SELECT
'Payroll Validation' as check_type,
sum(e.annual_salary_eur) as hr_total_salaries,
sum(gl.debit_amount) as finance_payroll_expenses,
'Monthly vs Annual' as note
FROM eurostyle_hr.employees e,
(SELECT sum(debit_amount) as debit_amount FROM eurostyle_finance.gl_journal_lines WHERE account_id = '6100') gl;
-- POS-Operations revenue consistency check
SELECT
'POS Revenue Validation' as check_type,
round(sum(pos.total_amount_eur), 2) as pos_total_revenue,
round((SELECT sum(total_amount_eur) FROM eurostyle_operational.orders), 2) as operations_revenue,
CASE WHEN abs(sum(pos.total_amount_eur) - (SELECT sum(total_amount_eur) FROM eurostyle_operational.orders)) < 0.01
THEN 'β
PERFECT MATCH'
ELSE 'β MISMATCH' END as status
FROM eurostyle_pos.transactions pos;
-- European VAT analysis by country
SELECT
s.country_code,
count(*) as transaction_count,
sum(t.subtotal_amount_eur) as subtotal_eur,
sum(t.tax_amount_eur) as vat_collected_eur,
round(sum(t.tax_amount_eur) / sum(t.subtotal_amount_eur) * 100, 2) as avg_vat_rate_percent
FROM eurostyle_pos.transactions t
JOIN eurostyle_operational.stores s ON t.store_id = s.store_id
GROUP BY s.country_code
ORDER BY vat_collected_eur DESC;This demo system is provided for educational and testing purposes. The EuroStyle Fashion business scenario is fictional and designed specifically for data platform demonstrations.
Found this useful? Please star this repository to show your support and help others discover this demo platform!
We welcome contributions! Whether it's:
- π Bug reports and fixes
- β¨ New features and business scenarios
- π Documentation improvements
- π§ Integration examples (dbt, Superset, etc.)
See CONTRIBUTING.md for detailed guidelines.
Thanks to everyone who has contributed to making this demo platform better!
Want to be featured here? Check out our contribution guide!
πͺ EuroStyle Fashion - European retail data, reimagined for the modern data stack
The CSV data files are stored in compressed format (gzip) to reduce repository size:
# Decompress all data files
./scripts/decompress_data.sh --all
# Decompress specific modules
./scripts/decompress_data.sh --finance
./scripts/decompress_data.sh --hr
./scripts/decompress_data.sh --webshop./scripts/compress_data.shSize Comparison:
- Uncompressed: ~100MB
- Compressed: ~14MB (86% reduction)










