diff --git a/cspell.yml b/cspell.yml
index fa53d1eb2d..84449fb143 100644
--- a/cspell.yml
+++ b/cspell.yml
@@ -39,6 +39,11 @@ overrides:
       - Graphile
       - precompiled
       - debuggable
+      - opentelemetry
+      - OTLP
+      - otlp
+      - Millis
+      - Kubernetes
 
 ignoreRegExpList:
   - u\{[0-9a-f]{1,8}\}
diff --git a/website/pages/docs/_meta.ts b/website/pages/docs/_meta.ts
index 97c5bc2b2e..4f1673d6a2 100644
--- a/website/pages/docs/_meta.ts
+++ b/website/pages/docs/_meta.ts
@@ -42,6 +42,7 @@ const meta = {
     title: 'FAQ',
   },
   'going-to-production': '',
+  'production-monitoring': '',
   'scaling-graphql': '',
 };
 
diff --git a/website/pages/docs/production-monitoring.mdx b/website/pages/docs/production-monitoring.mdx
new file mode 100644
index 0000000000..4436ff9d3d
--- /dev/null
+++ b/website/pages/docs/production-monitoring.mdx
@@ -0,0 +1,554 @@
+---
+title: Monitor GraphQL applications in production
+description: Implement structured logging, metrics collection, distributed tracing, and error tracking to maintain visibility into your GraphQL.js application's health and performance.
+---
+
+Monitoring and observability give you visibility into how your GraphQL application behaves 
+in production. They help you detect issues before users report them, diagnose problems when 
+they occur, and understand usage patterns.
+
+This guide shows you how to add logging, metrics, tracing, and error tracking to your 
+GraphQL.js application. You'll learn what data to collect at each 
+layer of your GraphQL execution, how to structure that data for analysis, and how to 
+use it to maintain reliable service. The patterns work across different monitoring tools 
+and platforms, so you can adapt them to your infrastructure.
+
+## Add structured logging
+
+Structured logging captures events in a consistent, machine-readable format that 
+monitoring systems can parse and analyze. Instead of plain text messages, you output 
+JSON objects with predictable fields. This makes it easier to filter logs, aggregate 
+metrics, and trace requests across services.
+
+For GraphQL applications, you want to log three types of events: incoming operations, 
+resolver execution, and errors. Each type provides different insights into your 
+application's behavior.
+
+### Log GraphQL operations
+
+Capture details about each GraphQL request your server receives. This creates an 
+audit trail and helps you understand usage patterns.
+
+```javascript
+import { graphql } from 'graphql';
+import { logger } from './logger.js';
+
+export async function executeGraphQLRequest(schema, source, contextValue) {
+  const startTime = Date.now();
+  
+  const result = await graphql({
+    schema,
+    source,
+    contextValue
+  });
+  
+  const duration = Date.now() - startTime;
+  
+  logger.info('graphql_operation', {
+    operationType: result.operationType,
+    operationName: contextValue.operationName,
+    duration,
+    hasErrors: !!result.errors,
+    timestamp: new Date().toISOString()
+  });
+  
+  return result;
+}
+```
+
+This example wraps the GraphQL execution and logs basic operation details after each 
+request completes. The logger captures the operation type, the operation name if provided, 
+how long execution took, and whether errors occurred.
+
+To adapt this pattern, replace `logger` with your chosen logging library. Add 
+fields relevant to your application like user IDs, client versions, or geographic 
+regions. Attach this logging to your GraphQL endpoint handler so every operation 
+gets recorded.
+
+### Log resolver performance
+
+Track how long individual resolvers take to execute. This helps identify 
+slow data fetches or bottlenecks.
+
+```javascript
+export function instrumentResolver(resolverFn, fieldName) {
+  return async function(parent, args, context, info) {
+    const startTime = Date.now();
+    
+    try {
+      const result = await resolverFn(parent, args, context, info);
+      
+      logger.debug('resolver_execution', {
+        fieldName,
+        parentType: info.parentType.name,
+        duration: Date.now() - startTime,
+        traceId: context.traceId
+      });
+      
+      return result;
+    } catch (error) {
+      logger.error('resolver_error', {
+        fieldName,
+        parentType: info.parentType.name,
+        error: error.message,
+        traceId: context.traceId
+      });
+      throw error;
+    }
+  };
+}
+```
+
+The example wrapper measures resolver execution time and logs it on success, or logs 
+error details if the resolver throws.
+
+Apply this wrapper to resolvers you want to monitor. For high-traffic applications, 
+use sampling to log only a percentage of resolver executions to reduce log volume. 
+Include a `traceId` from your context to correlate resolver logs with operation logs.
+
+### Structure logs for analysis
+
+Use consistent field names and data types across all log entries. This makes it 
+easier to query and aggregate logs in your monitoring system.
+
+```javascript
+{
+  "level": "info",
+  "type": "graphql_operation",
+  "operationName": "GetUser",
+  "operationType": "query",
+  "duration": 145,
+  "hasErrors": false,
+  "traceId": "abc123",
+  "timestamp": "2025-10-31T10:30:00.000Z"
+}
+
+{
+  "level": "debug",
+  "type": "resolver_execution",
+  "fieldName": "user",
+  "parentType": "Query",
+  "duration": 23,
+  "traceId": "abc123",
+  "timestamp": "2025-10-31T10:30:00.050Z"
+}
+```
+
+These example structures provide consistent fields for querying across your 
+monitoring system.
+
+When implementing this structure, standardize on ISO timestamps for all time 
+values. Use millisecond durations for consistency. Use boolean flags rather 
+than strings for true/false values. Keep frequently queried fields at the top 
+level rather than nested in objects.
+
+### Correlate logs across services
+
+When your GraphQL server calls other services, propagate a trace ID so you can 
+follow a request through your entire system.
+
+```javascript
+import { randomUUID } from 'crypto';
+
+export function createContext(req) {
+  const traceId = req.headers['x-trace-id'] || randomUUID();
+  
+  return {
+    traceId,
+    fetch: (url, options = {}) => {
+      return fetch(url, {
+        ...options,
+        headers: {
+          ...options.headers,
+          'x-trace-id': traceId
+        }
+      });
+    }
+  };
+}
+```
+
+This example checks for an incoming trace ID in request headers, generates a new 
+one if none exists, and provides a fetch wrapper that automatically propagates 
+the trace ID to downstream services.
+
+To integrate this approach, include the trace ID in every log entry you create. 
+Configure downstream services to extract and use the same trace ID. Use a 
+consistent header name across all your services. This creates a connected chain 
+of logs you can search to see how a request moved through your infrastructure.
+
+### Control log verbosity
+
+Balance the detail you capture with the performance impact and storage costs. 
+Not every application needs resolver-level logging in production.
+
+Consider these log levels for different scenarios. Use error level to always 
+log errors with full context for debugging. Use info level to log all GraphQL 
+operations for visibility into usage. Use debug level to log resolver execution 
+only in development or when troubleshooting specific issues.
+
+Set log levels through environment variables so you can adjust verbosity without 
+code changes. Use sampling for high-volume debug logs by logging every Nth request 
+instead of everything when debug logging is enabled.
+
+### Avoid logging sensitive data
+
+Never log passwords, API keys, tokens, or personally identifiable information. 
+Sanitize variables and context before logging.
+```javascript
+function sanitizeVariables(variables) {
+  const sensitiveFields = ['password', 'token', 'apiKey', 'ssn'];
+  const sanitized = { ...variables };
+  
+  for (const field of sensitiveFields) {
+    if (field in sanitized) {
+      sanitized[field] = '[REDACTED]';
+    }
+  }
+  
+  return sanitized;
+}
+
+logger.info('graphql_operation', {
+  operationName: contextValue.operationName,
+  variables: sanitizeVariables(contextValue.variables)
+});
+```
+
+The example function creates a copy of the variables object and replaces sensitive field 
+values with a redaction marker.
+
+To adapt this for your schema, customize the `sensitiveFields` list to match your 
+sensitive data. Consider using allowlists instead of denylists for higher 
+security by only logging fields you explicitly mark as safe.
+
+## Collect metrics
+
+Metrics give you quantitative data about your GraphQL server's behavior over time. 
+Unlike logs that capture individual events, metrics aggregate data into counts, rates, 
+and distributions. This helps you spot trends, set alerts, and measure performance 
+against targets.
+
+You need metrics at multiple levels. Track operations to understand how many queries 
+run. Track resolvers to see where time is spent. Track schema usage to know which 
+fields get used. Collecting these metrics requires instrumenting your GraphQL 
+execution pipeline.
+
+### Track operation metrics
+
+Measure the volume, latency, and success rate of GraphQL operations. These top-level 
+metrics indicate overall service health.
+
+```javascript
+import { graphql } from 'graphql';
+
+const operationMetrics = {
+  count: 0,
+  errors: 0,
+  durations: []
+};
+
+export async function executeGraphQLRequest(schema, source, contextValue) {
+  const startTime = Date.now();
+  operationMetrics.count++;
+  
+  const result = await graphql({
+    schema,
+    source,
+    contextValue
+  });
+  
+  const duration = Date.now() - startTime;
+  operationMetrics.durations.push(duration);
+  
+  if (result.errors) {
+    operationMetrics.errors++;
+  }
+  
+  return result;
+}
+
+export function getOperationMetrics() {
+  return {
+    totalOperations: operationMetrics.count,
+    errorRate: operationMetrics.errors / operationMetrics.count,
+    p95Latency: calculatePercentile(operationMetrics.durations, 0.95),
+    p99Latency: calculatePercentile(operationMetrics.durations, 0.99)
+  };
+}
+```
+
+This example tracks basic counters and timing data in memory, then calculates metrics 
+like error rate and latency percentiles.
+
+To implement this in production, replace the in-memory storage with your 
+metrics library's counters and histograms. Export these metrics through an 
+HTTP endpoint that your monitoring system can scrape. Track metrics separately 
+by operation name and type to identify which operations cause issues.
+
+### Instrument resolver execution
+
+Resolver metrics reveal which parts of your schema are slow or problematic. 
+This granular data helps you optimize specific fields rather than entire operations.
+
+```javascript
+export function createInstrumentedResolver(resolverFn, typeName, fieldName) {
+  const metricKey = `${typeName}.${fieldName}`;
+  
+  return async function(parent, args, context, info) {
+    const startTime = Date.now();
+    
+    try {
+      const result = await resolverFn(parent, args, context, info);
+      const duration = Date.now() - startTime;
+      
+      context.metrics.recordResolverDuration(metricKey, duration);
+      
+      return result;
+    } catch (error) {
+      context.metrics.incrementResolverErrors(metricKey);
+      throw error;
+    }
+  };
+}
+
+const resolvers = {
+  Query: {
+    user: createInstrumentedResolver(userResolver, 'Query', 'user'),
+    posts: createInstrumentedResolver(postsResolver, 'Query', 'posts')
+  }
+};
+```
+
+This example wrapper measures how long the resolver takes to execute and records it using a 
+metric key that combines the type and field name. If the resolver throws an error, 
+it increments an error counter before re-throwing.
+
+When integrating this pattern, add the `metrics` object to your GraphQL context 
+with methods that call your metrics library. For large schemas, use automated 
+wrapping to instrument all resolvers without manual work. Be cautious with 
+cardinality: if you have thousands of fields, consider sampling or instrumenting 
+only high-value resolvers.
+
+### Monitor schema field usage
+
+Track which fields clients actually query. This data informs schema evolution 
+decisions. You'll know which fields are safe to deprecate and which need optimization.
+
+```javascript
+import { execute } from 'graphql';
+
+export async function executeWithFieldTracking(args) {
+  const fieldUsage = new Map();
+  
+  const result = await execute({
+    ...args,
+    fieldResolver: (source, args, context, info) => {
+      const fieldPath = `${info.parentType.name}.${info.fieldName}`;
+      fieldUsage.set(fieldPath, (fieldUsage.get(fieldPath) || 0) + 1);
+      
+      const resolver = info.parentType.getFields()[info.fieldName].resolve;
+      if (resolver) {
+        return resolver(source, args, context, info);
+      }
+      return source?.[info.fieldName];
+    }
+  });
+  
+  for (const [field, count] of fieldUsage) {
+    context.metrics.recordFieldUsage(field, count);
+  }
+  
+  return result;
+}
+```
+
+The custom field resolver in this example intercepts every field access and increments a 
+counter for that field path. After execution completes, it exports all the 
+field usage counts to your metrics system.
+
+To use this effectively, adapt this pattern to your metrics library. Aggregate 
+field usage over time windows to track trends. Combine this with operation 
+names to understand which clients use which fields.
+
+### Expose metrics for collection
+
+Make your metrics available to monitoring systems. The approach depends on whether 
+you use push-based or pull-based collection.
+
+Pull-based systems like Prometheus scrape metrics from an HTTP endpoint you expose:
+
+```javascript
+import express from 'express';
+import { register } from 'prom-client';
+
+const app = express();
+
+app.get('/metrics', async (req, res) => {
+  res.set('Content-Type', register.contentType);
+  const metrics = await register.metrics();
+  res.send(metrics);
+});
+```
+
+This example uses the Prometheus client library to expose metrics via HTTP endpoint. 
+Your monitoring tool periodically requests the `/metrics` endpoint to collect current 
+values.
+
+Push-based systems require you to send metrics to a collector at regular intervals:
+
+```javascript
+import { MeterProvider, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
+import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';
+
+const exporter = new OTLPMetricExporter({
+  url: 'http://your-collector:4318/v1/metrics'
+});
+
+const meterProvider = new MeterProvider({
+  readers: [
+    new PeriodicExportingMetricReader({
+      exporter,
+      exportIntervalMillis: 60000
+    })
+  ]
+});
+
+const meter = meterProvider.getMeter('graphql-server');
+const operationCounter = meter.createCounter('graphql.operations');
+```
+
+This example configures OpenTelemetry to push metrics every 60 seconds to a 
+collector endpoint.
+
+When choosing an approach, consider that pull-based works well for Kubernetes 
+environments with Prometheus. Push-based integrates better with cloud-native 
+monitoring services. Configure export intervals to balance freshness with 
+network overhead. Replace the collector URL with your actual endpoint.
+
+### Calculate query complexity metrics
+
+Track the complexity of operations to identify expensive queries. Complexity 
+scores help you set rate limits and optimize schema design.
+
+```javascript
+import { visit } from 'graphql';
+
+function calculateComplexity(document, schema) {
+  let complexity = 0;
+  
+  visit(document, {
+    Field(node) {
+      complexity++;
+      
+      const fieldDef = schema.getType(node.parentType)?.getFields()[node.name];
+      if (fieldDef?.type?.ofType?.name || fieldDef?.type?.name) {
+        const typeName = fieldDef.type.ofType?.name || fieldDef.type.name;
+        const fieldType = schema.getType(typeName);
+        if (fieldType?.astNode?.kind === 'ListType') {
+          complexity += 5;
+        }
+      }
+    }
+  });
+  
+  return complexity;
+}
+
+export async function executeWithComplexityTracking(schema, document, contextValue) {
+  const complexity = calculateComplexity(document, schema);
+  contextValue.metrics.recordComplexity(complexity);
+  
+  return graphql({ schema, document, contextValue });
+}
+```
+
+Each field adds 1 to the complexity score. List fields add an additional 
+5 points since they typically require more resources. The execution 
+wrapper calculates complexity before running the query and records it as a metric.
+
+To customize this example for your needs, adjust the complexity calculation for your schema. 
+Assign different weights to expensive fields. Record complexity as a histogram to 
+track distribution over time, not just averages.
+
+### Sample high-volume metrics
+
+For high-traffic applications, recording every resolver execution creates too 
+much data. Use sampling to capture a representative subset.
+
+```javascript
+export function createSampledResolver(resolverFn, typeName, fieldName, sampleRate = 0.1) {
+  const metricKey = `${typeName}.${fieldName}`;
+  
+  return async function(parent, args, context, info) {
+    const shouldSample = Math.random() < sampleRate;
+    
+    if (!shouldSample) {
+      return resolverFn(parent, args, context, info);
+    }
+    
+    const startTime = Date.now();
+    const result = await resolverFn(parent, args, context, info);
+    const duration = Date.now() - startTime;
+    
+    context.metrics.recordResolverDuration(metricKey, duration, 1 / sampleRate);
+    
+    return result;
+  };
+}
+```
+
+The function randomly decides whether to sample each resolver execution 
+based on the sample rate. When sampled, it records the duration adjusted by the 
+inverse of the sample rate to maintain accurate aggregates.
+
+When implementing sampling, set sample rates based on traffic 
+volume. Adjust recorded metric values to account for sampling. This gives you 
+accurate aggregates while reducing overhead.
+
+### Monitor resource utilization
+
+Track system resources your GraphQL server consumes. Memory leaks, CPU spikes, 
+and connection pool exhaustion all impact performance.
+
+```javascript
+import { register, collectDefaultMetrics } from 'prom-client';
+
+collectDefaultMetrics({ register });
+
+export function recordResourceMetrics(context) {
+  const usage = process.memoryUsage();
+  
+  context.metrics.recordGauge('nodejs.memory.heap.used', usage.heapUsed);
+  context.metrics.recordGauge('nodejs.memory.heap.total', usage.heapTotal);
+  context.metrics.recordGauge('nodejs.memory.external', usage.external);
+  
+  const cpuUsage = process.cpuUsage();
+  context.metrics.recordGauge('nodejs.cpu.user', cpuUsage.user);
+  context.metrics.recordGauge('nodejs.cpu.system', cpuUsage.system);
+}
+```
+
+The first line in this example enables automatic collection of standard Node.js metrics 
+like event loop lag and garbage collection statistics. The function 
+adds custom metrics for memory and CPU usage.
+
+When implementing this pattern, collect these metrics periodically rather than 
+per-request. Add database connection pool metrics if you use connection 
+pooling. Monitor event loop lag to detect when Node.js can't keep up with 
+incoming requests.
+
+## Additional monitoring considerations
+
+Several other aspects are important for comprehensive production monitoring:
+
+- **Distributed tracing**: Propagate trace context through GraphQL operations and 
+  instrument resolvers to visualize request flow across services
+- **Error tracking**: Categorize and capture GraphQL errors with context for 
+  debugging, set up aggregation and alerting patterns
+- **Monitoring dashboards**: Create dashboards that display request metrics, 
+  error rates, query complexity, and schema usage for different stakeholders
+- **Service level objectives**: Establish SLIs and SLOs for critical GraphQL 
+  operations, including latency targets and error budgets
+- **Testing your setup**: Verify that logging, metrics, tracing, and alerting 
+  work as expected before production deployment
\ No newline at end of file