Skip to content

Commit b0bd560

Browse files
sandersonjstirnaman
authored andcommitted
feat(llms): LLM-friendly Markdown, ChatGPT and Claude links.
This enables LLM-friendly documentation for entire sections, allowing users to copy complete documentation sections with a single click. Lambda@Edge now generates .md files on-demand with: - Evaluated Hugo shortcodes - Proper YAML frontmatter with product metadata - Clean markdown without UI elements - Section aggregation (parent + children in single file) The llms.txt files are now generated automatically during build from content structure and product metadata in data/products.yml, eliminating the need for hardcoded files and ensuring maintainability. **Testing**: - Automated markdown generation in test setup via cy.exec() - Implement dynamic content validation that extracts HTML content and verifies it appears in markdown version **Documentation**: Documents LLM-friendly markdown generation **Details**: Add gzip decompression for S3 HTML files in Lambda markdown generator HTML files stored in S3 are gzip-compressed but the Lambda was attempting to parse compressed data as UTF-8, causing JSDOM to fail to find article elements. This resulted in 404 errors for .md and .section.md requests. - Add zlib gunzip decompression in s3-utils.js fetchHtmlFromS3() - Detect gzip via ContentEncoding header or magic bytes (0x1f 0x8b) - Add configurable DEBUG constant for verbose logging - Add debug logging for buffer sizes and decompression in both files The decompression adds ~1-5ms per request but is necessary to parse HTML correctly. CloudFront caching minimizes Lambda invocations. Await async markdown conversion functions The convertToMarkdown and convertSectionToMarkdown functions are async but weren't being awaited, causing the Lambda to return a Promise object instead of a string. This resulted in CloudFront validation errors: "The body is not a string, is not an object, or exceeds the maximum size" **Troubleshooting**: - Set DEBUG for troubleshooting in lambda
1 parent 98e2112 commit b0bd560

40 files changed

+7317
-108
lines changed

.circleci/config.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ jobs:
4242
- run:
4343
name: Hugo Build
4444
command: yarn hugo --environment production --logLevel info --gc --destination workspace/public
45+
- run:
46+
name: Generate LLM-friendly Markdown
47+
command: node scripts/html-to-markdown.js
4548
- persist_to_workspace:
4649
root: workspace
4750
paths:

.claude/settings.json

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"permissions": {
3+
"allow": [
4+
],
5+
"deny": [
6+
"Read(./.env)",
7+
"Read(./.env.*)",
8+
"Read(./secrets/**)",
9+
"Read(./config/credentials.json)",
10+
"Read(./build)"
11+
],
12+
"ask": [
13+
"Bash(git push:*)"
14+
]
15+
}
16+
}

.gitignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,25 @@ tmp
3838

3939
# TypeScript build output
4040
**/dist/
41+
**/dist-lambda/
4142

4243
# User context files for AI assistant tools
4344
.context/*
4445
!.context/README.md
4546

4647
# External repos
4748
.ext/*
49+
50+
# Lambda deployment artifacts
51+
deploy/llm-markdown/lambda-edge/markdown-generator/*.zip
52+
deploy/llm-markdown/lambda-edge/markdown-generator/package-lock.json
53+
deploy/llm-markdown/lambda-edge/markdown-generator/.package-tmp/
54+
deploy/llm-markdown/lambda-edge/markdown-generator/yarn.lock
55+
deploy/llm-markdown/lambda-edge/markdown-generator/config.json
56+
57+
# JavaScript/TypeScript build artifacts
58+
*.tsbuildinfo
59+
*.d.ts
60+
*.d.ts.map
61+
*.js.map
62+
.eslintcache

.s3deploy.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ routes:
44
headers:
55
Cache-Control: "max-age=630720000, no-transform, public"
66
gzip: true
7-
- route: "^.+\\.(html|xml|json|js)$"
7+
- route: "^.+\\.(html|xml|json|js|md)$"
88
gzip: true

DOCS-DEPLOYING.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Deploying InfluxData documentation
2+
3+
## Lambda\@Edge Markdown Generator
4+
5+
docs.influxdata.com uses a Lambda\@Edge function for on-demand markdown generation from HTML documentation. The generated markdown files are optimized for LLMs, coding assistants, and agents.
6+
7+
### Architecture
8+
9+
```
10+
User Request (*.md)
11+
12+
CloudFront Distribution
13+
14+
Lambda@Edge (Origin Request)
15+
16+
Fetch HTML from S3
17+
18+
Convert to Markdown (using shared library)
19+
20+
Return to CloudFront (cached 1hr)
21+
22+
User receives Markdown
23+
```
24+
25+
### Repository Structure
26+
27+
All markdown generation code is in this repository:
28+
29+
```
30+
docs-v2/
31+
├── scripts/
32+
│ ├── lib/markdown-converter.js # Shared conversion library
33+
│ └── html-to-markdown.js # Local CLI for testing
34+
├── deploy/
35+
│ └── llm-markdown/
36+
│ ├── README.md # Lambda deployment guide
37+
│ └── lambda-edge/
38+
│ └── markdown-generator/
39+
│ ├── index.js # Lambda handler
40+
│ ├── lib/s3-utils.js # S3 operations
41+
│ ├── deploy.sh # Deployment script
42+
│ └── package.json # Lambda dependencies
43+
└── cypress/e2e/content/
44+
└── markdown-content-validation.cy.js # Validation tests
45+
```
46+
47+
### Local Development
48+
49+
Generate markdown files locally for testing:
50+
51+
```bash
52+
# Prerequisites
53+
yarn install
54+
yarn build:ts
55+
npx hugo --quiet
56+
57+
# Generate markdown for specific path
58+
node scripts/html-to-markdown.js --path influxdb3/core/get-started --limit 10
59+
60+
# Run validation tests
61+
node cypress/support/run-e2e-specs.js \
62+
--spec "cypress/e2e/content/markdown-content-validation.cy.js"
63+
```
64+
65+
See [DOCS-TESTING.md](DOCS-TESTING.md) for comprehensive testing documentation.
66+
67+
### Lambda Deployment
68+
69+
Deploy the Lambda\@Edge function to AWS:
70+
71+
```bash
72+
# Navigate to Lambda directory
73+
cd deploy/llm-markdown/lambda-edge/markdown-generator
74+
75+
# Install dependencies
76+
npm install
77+
78+
# Deploy to staging
79+
./deploy.sh staging
80+
81+
# Deploy to production
82+
./deploy.sh production
83+
```
84+
85+
See [deploy/llm-markdown/README.md](deploy/llm-markdown/README.md) for detailed deployment instructions.

0 commit comments

Comments
 (0)