Skip to content

Commit eede9c0

Browse files
committed
feat(llms): Add build-time LLM-friendly Markdown generation
Implements static Markdown generation during Hugo build. **Key Features:** - Two-phase generation: HTML→MD (memory-bounded), MD→sections (fast) - Automatic redirect detection via file size check (skips Hugo aliases) - Product detection using compiled TypeScript product-mappings module - Token estimation for LLM context planning (4 chars/token heuristic) - YAML serialization with description sanitization **Performance:** - ~105 seconds for 5,000 pages + 500 sections - ~300MB peak memory (safe for 2GB CircleCI environment) - 23 files/sec conversion rate with controlled concurrency **Configuration Parameters:** - MIN_HTML_SIZE_BYTES (default: 1024) - Skip files below threshold - CHARS_PER_TOKEN (default: 4) - Token estimation ratio - Concurrency: 10 workers (CI), 20 workers (local) **Output:** - Single pages: public/*/index.md (with frontmatter + content) - Section bundles: public/*/index.section.md (aggregated child pages) **Files Changed:** - scripts/build-llm-markdown.js (new) - Main build script - scripts/lib/markdown-converter.cjs (renamed from .js) - Core conversion - scripts/html-to-markdown.js - Updated import path - package.json - Updated exports for .cjs module Related: Replaces Lambda@Edge on-demand generation (5s response time) with build-time static generation for production deployment. feat(deploy): Add staging deployment workflow and update CI Integrates LLM markdown generation into deployment workflows with a complete staging deployment solution. **CircleCI Updates:** - Switch from legacy html-to-markdown.js to optimized build:md - 2x performance improvement (105s vs 200s+ for 5000 pages) - Better memory management (300MB vs variable) - Enables section bundle generation (index.section.md files) **Staging Deployment:** - New scripts/deploy-staging.sh for local staging deploys - Complete workflow: Hugo build → markdown gen → S3 upload - Environment variable driven configuration - Optional step skipping for faster iteration - CloudFront cache invalidation support **NPM Scripts:** - Added deploy:staging command for convenience - Wraps deploy-staging.sh script **Documentation:** - Updated DOCS-DEPLOYING.md with comprehensive guide - Merged staging/production workflows with Lambda@Edge docs - Build-time generation now primary, Lambda@Edge fallback - Troubleshooting section with common issues - Environment variable reference - Performance metrics and optimization tips **Benefits:** - Manual staging validation before production - Consistent markdown generation across environments - Faster CI builds with optimized script - Better error handling and progress reporting - Section aggregation for improved LLM context **Usage:** ```bash export STAGING_BUCKET="test2.docs.influxdata.com" export AWS_REGION="us-east-1" export STAGING_CF_DISTRIBUTION_ID="E1XXXXXXXXXX" yarn deploy:staging ``` Related: Completes build-time markdown generation implementation refactor: Remove Lambda@Edge implementation Build-time markdown generation has replaced Lambda@Edge on-demand generation as the primary method. Removed Lambda code and updated documentation to focus on build-time generation and testing. Removed: - deploy/llm-markdown/ directory (Lambda@Edge code) - Lambda@Edge section from DOCS-DEPLOYING.md Added: - Testing and Validation section in DOCS-DEPLOYING.md - Focus on build-time generation workflow
1 parent b0bd560 commit eede9c0

File tree

16 files changed

+981
-2685
lines changed

16 files changed

+981
-2685
lines changed

.circleci/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
command: yarn hugo --environment production --logLevel info --gc --destination workspace/public
4545
- run:
4646
name: Generate LLM-friendly Markdown
47-
command: node scripts/html-to-markdown.js
47+
command: yarn build:md
4848
- persist_to_workspace:
4949
root: workspace
5050
paths:

DOCS-DEPLOYING.md

Lines changed: 293 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,85 +1,330 @@
1-
# Deploying InfluxData documentation
1+
# Deploying InfluxData Documentation
22

3-
## Lambda\@Edge Markdown Generator
3+
This guide covers deploying the docs-v2 site to staging and production environments, as well as LLM markdown generation.
44

5-
docs.influxdata.com uses a Lambda\@Edge function for on-demand markdown generation from HTML documentation. The generated markdown files are optimized for LLMs, coding assistants, and agents.
5+
## Table of Contents
66

7-
### Architecture
7+
- [Staging Deployment](#staging-deployment)
8+
- [Production Deployment](#production-deployment)
9+
- [LLM Markdown Generation](#llm-markdown-generation)
10+
- [Testing and Validation](#testing-and-validation)
11+
- [Troubleshooting](#troubleshooting)
812

13+
## Staging Deployment
14+
15+
Staging deployments are manual and run locally with your AWS credentials.
16+
17+
### Prerequisites
18+
19+
1. **AWS Credentials** - Configure AWS CLI with appropriate permissions:
20+
```bash
21+
aws configure
22+
```
23+
24+
2. **s3deploy** - Install the s3deploy binary:
25+
```bash
26+
./deploy/ci-install-s3deploy.sh
27+
```
28+
29+
3. **Environment Variables** - Set required variables:
30+
```bash
31+
export STAGING_BUCKET="test2.docs.influxdata.com"
32+
export AWS_REGION="us-east-1"
33+
export STAGING_CF_DISTRIBUTION_ID="E1XXXXXXXXXX" # Optional
34+
```
35+
36+
### Deploy to Staging
37+
38+
Use the staging deployment script:
39+
40+
```bash
41+
yarn deploy:staging
42+
```
43+
44+
Or run the script directly:
45+
46+
```bash
47+
./scripts/deploy-staging.sh
48+
```
49+
50+
### What the Script Does
51+
52+
1. **Builds Hugo site** with staging configuration (`config/staging/hugo.yml`)
53+
2. **Generates LLM-friendly Markdown** (`yarn build:md`)
54+
3. **Uploads to S3** using s3deploy
55+
4. **Invalidates CloudFront cache** (if `STAGING_CF_DISTRIBUTION_ID` is set)
56+
57+
### Optional Environment Variables
58+
59+
Skip specific steps for faster iteration:
60+
61+
```bash
62+
# Skip Hugo build (use existing public/)
63+
export SKIP_BUILD=true
64+
65+
# Skip markdown generation
66+
export SKIP_MARKDOWN=true
67+
68+
# Build only (no S3 upload)
69+
export SKIP_DEPLOY=true
70+
```
71+
72+
### Example: Test Markdown Generation Only
73+
74+
```bash
75+
SKIP_DEPLOY=true ./scripts/deploy-staging.sh
976
```
10-
User Request (*.md)
11-
12-
CloudFront Distribution
13-
14-
Lambda@Edge (Origin Request)
15-
16-
Fetch HTML from S3
17-
18-
Convert to Markdown (using shared library)
19-
20-
Return to CloudFront (cached 1hr)
21-
22-
User receives Markdown
77+
78+
## Production Deployment
79+
80+
Production deployments are **automatic** via CircleCI when merging to `master`.
81+
82+
### Workflow
83+
84+
1. **Build Job** (`.circleci/config.yml`):
85+
- Installs dependencies
86+
- Builds Hugo site with production config
87+
- Generates LLM-friendly Markdown (`yarn build:md`)
88+
- Persists workspace for deploy job
89+
90+
2. **Deploy Job**:
91+
- Attaches workspace
92+
- Uploads to S3 using s3deploy
93+
- Invalidates CloudFront cache
94+
- Posts success notification to Slack
95+
96+
### Environment Variables (CircleCI)
97+
98+
Production deployment requires the following environment variables set in CircleCI:
99+
100+
- `BUCKET` - Production S3 bucket name
101+
- `REGION` - AWS region
102+
- `CF_DISTRIBUTION_ID` - CloudFront distribution ID
103+
- `SLACK_WEBHOOK_URL` - Slack notification webhook
104+
105+
### Trigger Production Deploy
106+
107+
```bash
108+
git push origin master
23109
```
24110

25-
### Repository Structure
111+
CircleCI will automatically build and deploy.
112+
113+
## LLM Markdown Generation
114+
115+
Both staging and production deployments generate LLM-friendly Markdown files at build time.
26116

27-
All markdown generation code is in this repository:
117+
### Output Files
28118

119+
The build generates two types of markdown files in `public/`:
120+
121+
1. **Single-page markdown** (`index.md`)
122+
- Individual page content with frontmatter
123+
- Contains: title, description, URL, product, version, token estimate
124+
125+
2. **Section bundles** (`index.section.md`)
126+
- Aggregated section with all child pages
127+
- Includes child page list in frontmatter
128+
- Optimized for LLM context windows
129+
130+
### Generation Script
131+
132+
```bash
133+
# Generate all markdown
134+
yarn build:md
135+
136+
# Generate for specific path
137+
node scripts/build-llm-markdown.js --path influxdb3/core/get-started
138+
139+
# Limit number of files (for testing)
140+
node scripts/build-llm-markdown.js --limit 100
29141
```
30-
docs-v2/
31-
├── scripts/
32-
│ ├── lib/markdown-converter.js # Shared conversion library
33-
│ └── html-to-markdown.js # Local CLI for testing
34-
├── deploy/
35-
│ └── llm-markdown/
36-
│ ├── README.md # Lambda deployment guide
37-
│ └── lambda-edge/
38-
│ └── markdown-generator/
39-
│ ├── index.js # Lambda handler
40-
│ ├── lib/s3-utils.js # S3 operations
41-
│ ├── deploy.sh # Deployment script
42-
│ └── package.json # Lambda dependencies
43-
└── cypress/e2e/content/
44-
└── markdown-content-validation.cy.js # Validation tests
142+
143+
### Configuration
144+
145+
Edit `scripts/build-llm-markdown.js` to adjust:
146+
147+
```javascript
148+
// Skip files smaller than this (Hugo alias redirects)
149+
const MIN_HTML_SIZE_BYTES = 1024;
150+
151+
// Token estimation ratio
152+
const CHARS_PER_TOKEN = 4;
153+
154+
// Concurrency (workers)
155+
const CONCURRENCY = process.env.CI ? 10 : 20;
45156
```
46157

47-
### Local Development
158+
### Performance
159+
160+
- **Speed**: \~105 seconds for 5,000 pages + 500 sections
161+
- **Memory**: \~300MB peak (safe for 2GB CircleCI)
162+
- **Rate**: \~23 files/second with memory-bounded parallelism
48163

49-
Generate markdown files locally for testing:
164+
## Testing and Validation
165+
166+
### Local Testing
167+
168+
Test markdown generation locally before deploying:
50169

51170
```bash
52171
# Prerequisites
53172
yarn install
54173
yarn build:ts
55174
npx hugo --quiet
56175

176+
# Generate markdown for testing
177+
yarn build:md
178+
57179
# Generate markdown for specific path
58-
node scripts/html-to-markdown.js --path influxdb3/core/get-started --limit 10
180+
node scripts/build-llm-markdown.js --path influxdb3/core/get-started --limit 10
59181

60182
# Run validation tests
61183
node cypress/support/run-e2e-specs.js \
62184
--spec "cypress/e2e/content/markdown-content-validation.cy.js"
63185
```
64186

187+
### Validation Checks
188+
189+
The Cypress tests validate:
190+
191+
- ✅ No raw Hugo shortcodes (`{{< >}}` or `{{% %}}`)
192+
- ✅ No HTML comments
193+
- ✅ Proper YAML frontmatter with required fields
194+
- ✅ UI elements removed (feedback forms, navigation)
195+
- ✅ GitHub-style callouts (Note, Warning, etc.)
196+
- ✅ Properly formatted tables, lists, and code blocks
197+
- ✅ Product context metadata
198+
- ✅ Clean link formatting
199+
65200
See [DOCS-TESTING.md](DOCS-TESTING.md) for comprehensive testing documentation.
66201

67-
### Lambda Deployment
202+
## Troubleshooting
203+
204+
### s3deploy Not Found
205+
206+
Install the s3deploy binary:
207+
208+
```bash
209+
./deploy/ci-install-s3deploy.sh
210+
```
211+
212+
Verify installation:
213+
214+
```bash
215+
s3deploy -version
216+
```
217+
218+
### Missing Environment Variables
219+
220+
Check required variables are set:
221+
222+
```bash
223+
echo $STAGING_BUCKET
224+
echo $AWS_REGION
225+
```
226+
227+
Set them if missing:
228+
229+
```bash
230+
export STAGING_BUCKET="test2.docs.influxdata.com"
231+
export AWS_REGION="us-east-1"
232+
```
233+
234+
### AWS Permission Errors
235+
236+
Ensure your AWS credentials have the required permissions:
68237

69-
Deploy the Lambda\@Edge function to AWS:
238+
- `s3:PutObject` - Upload files to S3
239+
- `s3:DeleteObject` - Delete old files from S3
240+
- `cloudfront:CreateInvalidation` - Invalidate cache
241+
242+
Check your AWS profile:
70243

71244
```bash
72-
# Navigate to Lambda directory
73-
cd deploy/llm-markdown/lambda-edge/markdown-generator
245+
aws sts get-caller-identity
246+
```
247+
248+
### Hugo Build Fails
249+
250+
Check for:
74251

75-
# Install dependencies
76-
npm install
252+
- Missing dependencies (`yarn install`)
253+
- TypeScript compilation errors (`yarn build:ts`)
254+
- Invalid Hugo configuration
77255

78-
# Deploy to staging
79-
./deploy.sh staging
256+
Build Hugo separately to isolate the issue:
80257

81-
# Deploy to production
82-
./deploy.sh production
258+
```bash
259+
yarn hugo --environment staging
83260
```
84261

85-
See [deploy/llm-markdown/README.md](deploy/llm-markdown/README.md) for detailed deployment instructions.
262+
### Markdown Generation Fails
263+
264+
Check for:
265+
266+
- Hugo build completed successfully
267+
- TypeScript compiled (`yarn build:ts`)
268+
- Sufficient memory available
269+
270+
Test markdown generation separately:
271+
272+
```bash
273+
yarn build:md --limit 10
274+
```
275+
276+
### CloudFront Cache Not Invalidating
277+
278+
If you see stale content after deployment:
279+
280+
1. Check `STAGING_CF_DISTRIBUTION_ID` is set correctly
281+
2. Verify AWS credentials have `cloudfront:CreateInvalidation` permission
282+
3. Manual invalidation:
283+
```bash
284+
aws cloudfront create-invalidation \
285+
--distribution-id E1XXXXXXXXXX \
286+
--paths "/*"
287+
```
288+
289+
### Deployment Timing Out
290+
291+
For large deployments:
292+
293+
1. **Skip markdown generation** if unchanged:
294+
```bash
295+
SKIP_MARKDOWN=true ./scripts/deploy-staging.sh
296+
```
297+
298+
2. **Use s3deploy's incremental upload**:
299+
- s3deploy only uploads changed files
300+
- First deploy is slower, subsequent deploys are faster
301+
302+
3. **Check network speed**:
303+
- Large uploads require good bandwidth
304+
- Consider deploying from an AWS region closer to the S3 bucket
305+
306+
## Deployment Checklist
307+
308+
### Before Deploying to Staging
309+
310+
- [ ] Run tests locally (`yarn lint`)
311+
- [ ] Build Hugo successfully (`yarn hugo --environment staging`)
312+
- [ ] Generate markdown successfully (`yarn build:md`)
313+
- [ ] Set staging environment variables
314+
- [ ] Have AWS credentials configured
315+
316+
### Before Merging to Master (Production)
317+
318+
- [ ] Test on staging first
319+
- [ ] Verify LLM markdown quality
320+
- [ ] Check for broken links (`yarn test:links`)
321+
- [ ] Run code block tests (`yarn test:codeblocks:all`)
322+
- [ ] Review CircleCI configuration changes
323+
- [ ] Ensure all tests pass
324+
325+
## Related Documentation
326+
327+
- [Contributing Guide](DOCS-CONTRIBUTING.md)
328+
- [Testing Guide](DOCS-TESTING.md)
329+
- [CircleCI Configuration](.circleci/config.yml)
330+
- [S3 Deploy Configuration](.s3deploy.yml)

0 commit comments

Comments
 (0)