Skip to content

naeem-gitonga/analytics-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Tenant Analytics Tracker

A standalone, serverless analytics tracking system for AWS that supports multiple applications and fine-grained IAM controls.

Architecture

Client/App → API Gateway → Lambda → S3 (bucket per app) → Athena → Visualization

Uses CDK for infrastructure.

Key Features:

  • âś… Multi-tenant support (one tracker, multiple apps/buckets)
  • âś… Fine-grained IAM controls
  • âś… Bucket name passed via request (no hard-coding)
  • âś… Privacy-focused (hashed IPs, no PII)
  • âś… Serverless and scalable
  • âś… SQL-queryable via Athena

Table of Contents

Configuration

1. Edit cdk/bin/app.ts

Configure the tracker for your use case:

import * as cdk from 'aws-cdk-lib';
import { AnalyticsTrackerStack } from '../lib/analytics-stack';

const app = new cdk.App();

new AnalyticsTrackerStack(
  app,
  'MyAnalyticsTracker',
  {
    // List of allowed S3 buckets (supports wildcards)
    allowedBuckets: [
      'app1-analytics-prod',
      'app2-analytics-staging',
      '*-analytics',  // All buckets ending with -analytics
    ],

    // CORS origin (use '*' for all, or specify domains)
    corsOrigin: '*',

    // Function and API naming
    functionPrefix: 'mycompany',
    apiName: 'mycompany-analytics-api',

    // Optional: Additional configuration
    enableMetrics: true,
    lambdaTimeout: 10,
  },
  {
    env: {
      account: process.env.CDK_DEFAULT_ACCOUNT,
      region: process.env.CDK_DEFAULT_REGION,
    },
  }
);

Local Dev

npm run offline

Deployment

# Build TypeScript
npm run build

# Preview changes
npm run synth

# Deploy to AWS
npm run deploy

After deployment, note the API endpoint URL:

Outputs:
MyAnalyticsTracker.ApiEndpoint = https://abc123.execute-api.us-east-1.amazonaws.com/prod

Usage

HTTP API Request

Send analytics events via POST request:

curl -X POST https://your-api-url.execute-api.us-east-1.amazonaws.com/prod/track \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "app1-analytics-prod",
    "eventType": "page_view",
    "timestamp": "2025-12-17T12:00:00.000Z",
    "page": "/articles/my-post",
    "userAgent": "Mozilla/5.0...",
    "viewport": { "width": 1920, "height": 1080 },
    "sessionId": "abc123",
    "referrer": "https://google.com"
    "fromWebsite": "LinkedIn"
  }'

Request Schema

Required Fields:

  • bucket (string): S3 bucket name to write to (must be in allowedBuckets)
  • eventType (string): Type of event (e.g., "page_view", "scroll_complete")
  • timestamp (string): ISO 8601 timestamp

Optional Fields:

  • page (string): Page path
  • userAgent (string): Browser user agent
  • viewport (object): { width: number, height: number }
  • sessionId (string): Session identifier
  • referrer (string): Referrer URL
  • metadata (object): Any additional custom data
  • fromWebsite (string): use for promo code to track promotions or other data relavant to where the interest comes from

Response

Success (200):

{
  "status": "ok",
  "eventId": "1702742400-a1b2c3d4",
  "bucket": "app1-analytics-prod",
  "key": "analytics/year=2025/month=12/day=17/1702742400-a1b2c3d4.json"
}

Error (400):

{
  "error": "Missing required field: bucket",
  "message": "You must specify the S3 bucket name in the request body"
}

Forbidden (403):

{
  "error": "Forbidden",
  "message": "The specified bucket is not authorized for this analytics service"
}

IAM Permissions

Lambda Function Permissions

The Lambda function is granted fine-grained S3 write permissions:

{
  "Effect": "Allow",
  "Action": [
    "s3:PutObject",
  ],
  "Resource": [
    "arn:aws:s3:::app1-analytics-prod/analytics/*",
    "arn:aws:s3:::app2-analytics-prod/analytics/*"
  ]
}

Bucket Validation

The Lambda validates incoming bucket names against the ALLOWED_BUCKETS environment variable:

// Exact match
allowedBuckets: ['app1-analytics', 'app2-analytics']

// Wildcard patterns
allowedBuckets: ['*-analytics']  // Matches: my-app-analytics, other-app-analytics
allowedBuckets: ['analytics-*']  // Matches: analytics-prod, analytics-staging

Querying Data

1. Set Up Athena Table

  1. Open sql/athena-setup.sql
  2. Replace YOUR-BUCKET-NAME with your bucket name
  3. Replace TABLE_NAME with a unique name (e.g., app1_events)
  4. Run in AWS Athena console
CREATE EXTERNAL TABLE analytics_db.app1_events (
  -- schema...
)
PARTITIONED BY (year STRING, month STRING, day STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://app1-analytics-prod/analytics/';

-- Load partitions
MSCK REPAIR TABLE analytics_db.app1_events;

2. Run Queries

See sql/example-queries.sql for common queries:

-- Daily page views
SELECT year, month, day, COUNT(*) as views
FROM analytics_db.app1_events
WHERE eventType = 'page_view'
GROUP BY year, month, day
ORDER BY year DESC, month DESC, day DESC;

-- Device breakdown
SELECT device.device_type, device.browser, COUNT(*) as count
FROM analytics_db.app1_events
WHERE eventType = 'page_view'
GROUP BY device.device_type, device.browser;

3. Visualization

Option A: AWS QuickSight

  • Connect to Athena
  • Create dashboards
  • ~$9/month per author

Option B: Export to CSV

  • Run queries in Athena
  • Download results
  • Import to Excel/Sheets/Tableau

Option C: Programmatic Access

import { AthenaClient, StartQueryExecutionCommand } from '@aws-sdk/client-athena';

const athena = new AthenaClient({ region: 'us-east-1' });
const result = await athena.send(new StartQueryExecutionCommand({
  QueryString: 'SELECT * FROM analytics_db.app1_events LIMIT 100',
  ResultConfiguration: { OutputLocation: 's3://query-results-bucket/' }
}));

Multi-Tenant Setup

Scenario: Multiple Apps, One Tracker

Setup:

// Deploy ONE tracker
new AnalyticsTrackerStack(app, 'SharedTracker', {
  allowedBuckets: [
    'website-analytics',
    'mobile-app-analytics',
    'api-analytics',
  ],
});

Usage:

# Website sends to website-analytics bucket
curl -X POST https://tracker.../track -d '{"bucket": "website-analytics", ...}'

# Mobile app sends to mobile-app-analytics bucket
curl -X POST https://tracker.../track -d '{"bucket": "mobile-app-analytics", ...}'

Athena Tables:

-- One table per app
CREATE EXTERNAL TABLE analytics_db.website_events (...) LOCATION 's3://website-analytics/analytics/';
CREATE EXTERNAL TABLE analytics_db.mobile_events (...) LOCATION 's3://mobile-app-analytics/analytics/';
CREATE EXTERNAL TABLE analytics_db.api_events (...) LOCATION 's3://api-analytics/analytics/';

Query All Apps:

SELECT 'Website' as app, COUNT(*) as views FROM analytics_db.website_events WHERE eventType = 'page_view'
UNION ALL
SELECT 'Mobile' as app, COUNT(*) as views FROM analytics_db.mobile_events WHERE eventType = 'page_view'
UNION ALL
SELECT 'API' as app, COUNT(*) as views FROM analytics_db.api_events WHERE eventType = 'page_view';

Security

IP Privacy

  • IPs are hashed using SHA-256
  • Only first 16 characters stored
  • Cannot reverse-engineer original IP

Bucket Authorization

  • Requests with unauthorized buckets return 403
  • Lambda validates bucket against whitelist
  • Supports wildcard patterns for flexibility

CORS

  • Configurable per deployment
  • Use '*' for public analytics
  • Specify domain for backend-only

Data Encryption

  • S3 server-side encryption (SSE-S3) recommended
  • Support for KMS via additionalPolicies
  • HTTPS enforced via API Gateway

Data Schema

Events are stored as JSON with this structure:

{
  eventId: string;           // Unique event ID
  eventType: string;         // Event type
  timestamp: string;         // Client timestamp (ISO 8601)
  serverTimestamp: string;   // Server timestamp (ISO 8601)
  page: string;              // Page path
  sessionId: string;         // Session ID
  ip: string;                // Hashed IP (16 chars)
  location: {
    country: string;         // Country code
    city: string;            // City name
    region: string;          // Region/state
  };
  device: {
    device_type: string;     // 'mobile' | 'tablet' | 'desktop'
    browser: string;         // Browser name
  };
  viewport: {
    width: number;           // Viewport width
    height: number;          // Viewport height
  };
  referrer: string;          // Referrer URL
  userAgent: string;         // User agent string
  fromWebsite: string;
  // ... any custom metadata
}

Troubleshooting

"Bucket not authorized" Error

Check Lambda environment variable ALLOWED_BUCKETS:

aws lambda get-function-configuration --function-name mycompany-analytics-tracker \
  --query 'Environment.Variables.ALLOWED_BUCKETS'

No Data in Athena

  1. Check S3 bucket for files:

    aws s3 ls s3://your-bucket/analytics/ --recursive
  2. Run partition repair:

    MSCK REPAIR TABLE analytics_db.your_table;
  3. Verify table location matches bucket

High Costs

  • Enable S3 Intelligent Tiering (transitions after 90 days)
  • Use partition projection to avoid MSCK REPAIR queries
  • Limit Athena query scope with WHERE clauses on partitions

Contributing

This is a standalone package. Contributions welcome!

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published