- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 88
feat: Built-in s3 key sanitizer (when creating files) #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
optionally enabled, should have no effect on valid filenames
| I will reformat the title to use the proper commit message syntax. | 
| 🚀 Thanks for opening this pull request! | 
| 📝 WalkthroughWalkthroughAdded an S3 key sanitization helper and a new cleanKey option to S3Adapter. createFile now derives a finalKey via optional generateKey, optionally sanitizes it, and uses it for PutObject. Tests verify cleaning behavior with/without generateKey and when cleanKey is disabled. Changes
 Sequence Diagram(s)sequenceDiagram
  participant Caller
  participant S3Adapter
  participant S3Client as AWS S3Client
  Caller->>S3Adapter: createFile(filename, data, contentType, options)
  alt generateKey provided
    S3Adapter->>S3Adapter: finalKey = generateKey(...)
  else
    S3Adapter->>S3Adapter: finalKey = filename
  end
  opt cleanKey enabled
    S3Adapter->>S3Adapter: finalKey = cleanS3Key(finalKey)
  end
  S3Adapter->>S3Client: PutObject(Key = bucketPrefix + finalKey, Body, ContentType, ACL, ...)
  S3Client-->>S3Adapter: PutObjectResult
  S3Adapter-->>Caller: result
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type  Other keywords and placeholders
 CodeRabbit Configuration File ( | 
| 🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
index.js (1)
209-256: Confirm downstream usage stores/returns the sanitized keyCleaning occurs only at write time; delete/get still take the caller-provided filename. Ensure the caller (Parse Server) persists and passes the sanitized finalKey thereafter, or you’ll get 404s on read/delete. You already noted the dependency on parse-server#9557; please document the minimum Parse Server version in the README and consider feature-flagging behavior based on an adapter capability handshake if needed.
spec/test.spec.js (2)
871-892: Good coverage for filename cleaning with cleanKey=trueThis verifies the PutObject Key is sanitized as expected. Consider also adding a case with a bucketPrefix to ensure the Key is prefix + cleaned name.
Apply this additional test:
it('should include bucketPrefix when cleaning is enabled', async () => { const options = { bucket: 'bucket-1', bucketPrefix: 'pref/', cleanKey: true }; const s3 = new S3Adapter(options); s3ClientMock.send.and.returnValue(Promise.resolve({})); s3._s3Client = s3ClientMock; const problematicFilename = ' bad name?.txt '; await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {}); const commandArg = s3ClientMock.send.calls.all()[1].args[0]; expect(commandArg).toBeInstanceOf(PutObjectCommand); expect(commandArg.input.Key.startsWith('pref/')).toBeTrue(); });
894-915: Cleaning applies to generated keys — add async generateKey varianceGreat to assert cleaning runs after generateKey. Add a variant with async generateKey to lock in behavior if we adopt awaiting generateKey in index.js.
it('should await async generateKey and then clean', async () => { const options = { bucket: 'bucket-1', cleanKey: true, generateKey: async (filename) => `generated/${filename.normalize('NFKD')}` }; const s3 = new S3Adapter(options); s3ClientMock.send.and.returnValue(Promise.resolve({})); s3._s3Client = s3ClientMock; await s3.createFile('café.txt', 'hello world', 'text/utf8', {}); const cmd = s3ClientMock.send.calls.all()[1].args[0]; expect(cmd).toBeInstanceOf(PutObjectCommand); expect(cmd.input.Key).toMatch(/^generated\//); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
- index.js(3 hunks)
- spec/test.spec.js(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
spec/test.spec.js (2)
spec/mocks/s3adapter.js (2)
s3(11-15)
S3Adapter(2-2)spec/support/server.js (1)
S3Adapter(8-8)
🔇 Additional comments (2)
index.js (2)
142-142: Constructor wiring for cleanKey looks goodStoring the option as a boolean flag is clear and keeps the feature opt-in.
221-224: Guard against undefined bucketPrefix when building the KeyIf bucketPrefix is not provided and optionsFromArguments doesn’t default it, you’ll end up with "undefined" on the wire. Safe-guard by falling back to ''.
- Key: this._bucketPrefix + finalKey, + Key: (this._bucketPrefix || '') + finalKey,
| function cleanS3Key(key) { | ||
| // Single-pass character replacement using a lookup map for better performance | ||
| // See: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html#object-key-guidelines | ||
|  | ||
| const charMap = { | ||
| '\\': '-', | ||
| '{': '-', '}': '-', | ||
| '^': '-', '`': '-', '[': '-', ']': '-', '"': '-', '<': '-', '>': '-', | ||
| '~': '-', '#': '-', '|': '-', '%': '-', | ||
| '&': '-and-', | ||
| '$': '-dollar-', | ||
| '@': '-at-', | ||
| '=': '-equals-', | ||
| ';': '-', ':': '-', '+': '-plus-', ',': '-', '?': '-' | ||
| }; | ||
|  | ||
| let result = ''; | ||
| let lastWasHyphen = false; | ||
|  | ||
| for (let i = 0; i < key.length; i++) { | ||
| const char = key[i]; | ||
| const charCode = char.charCodeAt(0); | ||
|  | ||
| // Skip non-printable ASCII and extended ASCII | ||
| if ((charCode >= 0 && charCode <= 31) || charCode === 127 || charCode >= 128) { | ||
| continue; | ||
| } | ||
|  | ||
| // Handle whitespace - convert to single hyphen | ||
| if (/\s/.test(char)) { | ||
| if (!lastWasHyphen) { | ||
| result += '-'; | ||
| lastWasHyphen = true; | ||
| } | ||
| continue; | ||
| } | ||
|  | ||
| // Replace problematic characters | ||
| if (charMap[char]) { | ||
| const replacement = charMap[char]; | ||
| if (replacement === '-') { | ||
| if (!lastWasHyphen) { | ||
| result += '-'; | ||
| lastWasHyphen = true; | ||
| } | ||
| } else { | ||
| result += replacement; | ||
| lastWasHyphen = false; | ||
| } | ||
| continue; | ||
| } | ||
|  | ||
| // Keep safe characters | ||
| result += char; | ||
| lastWasHyphen = false; | ||
| } | ||
|  | ||
| // Remove leading/trailing hyphens and periods | ||
| result = result.replace(/^[-\.]+|[-\.]+$/g, ''); | ||
|  | ||
| // Ensure we don't end up with an empty string | ||
| return result || `file-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
cleanS3Key: solid baseline, but consider preserving Unicode and tightening post-processing
The implementation is efficient and readable. A few improvements will make it safer and more broadly compatible with S3:
- Dropping all non-ASCII (Line 74) is aggressive; S3 supports UTF‑8 object keys. Consider percent-encoding (or configurable transliteration) instead of dropping.
- Expansions like "-and-" can introduce double hyphens between adjacent replacements. Collapse consecutive hyphens in a final pass.
- Guard against the 1024-byte S3 key length limit.
Suggested diffs:
- Preserve non-ASCII via encoding instead of dropping:
-    // Skip non-printable ASCII and extended ASCII
-    if ((charCode >= 0 && charCode <= 31) || charCode === 127 || charCode >= 128) {
-      continue;
-    }
+    // Skip non-printable ASCII; preserve Unicode by percent-encoding
+    if ((charCode >= 0 && charCode <= 31) || charCode === 127) {
+      continue;
+    }
+    if (charCode >= 128) {
+      result += encodeURIComponent(char);
+      lastWasHyphen = false;
+      continue;
+    }- Collapse multiple hyphens and enforce S3 key length (1024 bytes) after trimming:
   // Remove leading/trailing hyphens and periods
   result = result.replace(/^[-\.]+|[-\.]+$/g, '');
 
+  // Collapse multiple hyphens
+  result = result.replace(/-{2,}/g, '-');
+
+  // Enforce S3 key length (bytes)
+  if (Buffer.byteLength(result, 'utf8') > 1024) {
+    // Truncate conservatively at the byte boundary
+    let trimmed = '';
+    for (let i = 0, bytes = 0; i < result.length; i++) {
+      const b = Buffer.byteLength(result[i], 'utf8');
+      if (bytes + b > 1024) break;
+      trimmed += result[i];
+      bytes += b;
+    }
+    result = trimmed;
+  }
+
   // Ensure we don't end up with an empty string
   return result || `file-${Date.now()}-${Math.random().toString(36).substring(2, 11)}`;- Minor nit: avoid re-creating the charMap per invocation by hoisting it to module scope.
Example (outside the function):
const CLEAN_S3_KEY_CHARMAP = {
  '\\': '-',
  '{': '-', '}': '-',
  '^': '-', '`': '-', '[': '-', ']': '-', '"': '-', '<': '-', '>': '-',
  '~': '-', '#': '-', '|': '-', '%': '-',
  '&': '-and-',
  '$': '-dollar-',
  '@': '-at-',
  '=': '-equals-',
  ';': '-', ':': '-', '+': '-plus-', ',': '-', '?': '-'
};and inside the function:
-  const charMap = {
-    ...
-  };
+  const charMap = CLEAN_S3_KEY_CHARMAP;🤖 Prompt for AI Agents
In index.js around lines 50 to 112, the cleanS3Key function currently drops
non-ASCII characters, can produce consecutive hyphens from multi-character
replacements, recreates the charMap on every call, and doesn't enforce S3's
1024-byte key limit; fix by hoisting the charMap to module scope as a constant,
change the non-ASCII handling to percent-encode (or URL-encode) UTF-8 bytes
instead of skipping, after building the result collapse multiple consecutive
hyphens into a single hyphen and trim leading/trailing hyphens/periods, then
enforce the 1024-byte limit by truncating the UTF-8 encoded bytes (not
characters) and re-trimming if truncation leaves trailing hyphens/periods, and
ensure lastWasHyphen remains consistent when inserting multi-character
replacements.
| let finalKey = filename; | ||
|  | ||
| // Apply generateKey if provided | ||
| if (this._generateKey instanceof Function) { | ||
| finalKey = this._generateKey(filename, contentType, options); | ||
| } | ||
|  | ||
| // Apply key cleaning if enabled (after generateKey) | ||
| if (this._cleanKey) { | ||
| finalKey = cleanS3Key(finalKey); | ||
| } | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Support async generateKey to avoid surprising breakage
If a consumer provides an async generateKey, the current code will pass a Promise to cleanS3Key and to S3 as the Key. Since createFile is already async, await generateKey to support both sync and async implementations.
-    if (this._generateKey instanceof Function) {
-      finalKey = this._generateKey(filename, contentType, options);
-    }
+    if (this._generateKey instanceof Function) {
+      finalKey = await this._generateKey(filename, contentType, options);
+    }🏁 Script executed:
#!/bin/bash
# Locate createFile definition and check if it's async
rg -n "createFile" -C3 index.js | sed -n '1,30p'
# Specifically search for 'async createFile' in index.js
rg -n "async createFile" -n index.js
# Find where cleanS3Key is defined
rg -R "cleanS3Key" -n .Length of output: 879
Await async generateKey in createFile
Since createFile is already async, we should await any user-provided generateKey to avoid passing a Promise into cleanS3Key or to S3.
Locations to update:
- File: index.js
- Method: async createFile(...)
- Around lines 211–214
Apply this diff:
-    if (this._generateKey instanceof Function) {
-      finalKey = this._generateKey(filename, contentType, options);
-    }
+    if (this._generateKey instanceof Function) {
+      finalKey = await this._generateKey(filename, contentType, options);
+    }This ensures both sync and async generateKey implementations are handled correctly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let finalKey = filename; | |
| // Apply generateKey if provided | |
| if (this._generateKey instanceof Function) { | |
| finalKey = this._generateKey(filename, contentType, options); | |
| } | |
| // Apply key cleaning if enabled (after generateKey) | |
| if (this._cleanKey) { | |
| finalKey = cleanS3Key(finalKey); | |
| } | |
| let finalKey = filename; | |
| // Apply generateKey if provided | |
| if (this._generateKey instanceof Function) { | |
| finalKey = await this._generateKey(filename, contentType, options); | |
| } | |
| // Apply key cleaning if enabled (after generateKey) | |
| if (this._cleanKey) { | |
| finalKey = cleanS3Key(finalKey); | |
| } | 
🤖 Prompt for AI Agents
In index.js around lines 209 to 220, the code calls this._generateKey
synchronously and may pass a Promise into cleanS3Key or S3; change the block
that applies generateKey so that if this._generateKey is a function you await
its result (e.g. finalKey = await this._generateKey(filename, contentType,
options)), then continue to apply cleanS3Key and use finalKey; ensure createFile
remains async and handle the awaited value as the string key.
| it('should not clean keys when cleanKey option is disabled', async () => { | ||
| const options = { | ||
| bucket: 'bucket-1', | ||
| cleanKey: false // explicitly disabled | ||
| }; | ||
| const s3 = new S3Adapter(options); | ||
| s3ClientMock.send.and.returnValue(Promise.resolve({})); | ||
| s3._s3Client = s3ClientMock; | ||
|  | ||
| const problematicFilename = 'test file{with}[bad]chars.txt'; | ||
| await s3.createFile(problematicFilename, 'hello world', 'text/utf8', {}); | ||
|  | ||
| expect(s3ClientMock.send).toHaveBeenCalledTimes(2); | ||
| const commands = s3ClientMock.send.calls.all(); | ||
| const commandArg = commands[1].args[0]; | ||
| expect(commandArg).toBeInstanceOf(PutObjectCommand); | ||
|  | ||
| // Should preserve original filename | ||
| expect(commandArg.input.Key).toBe(problematicFilename); | ||
| }); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Non-cleaning path validated; please add Unicode and empty-result edge cases
Two additional edge-cases worth locking down:
- Unicode characters (e.g., 'café/Δ.txt') don’t get dropped unexpectedly (or do, if that’s desired), and behavior is documented.
- An all-removed input results in the fallback name.
I can contribute tests for these scenarios if helpful.
🤖 Prompt for AI Agents
In spec/test.spec.js around lines 917 to 936 add two small tests extending the
existing "cleanKey disabled" spec: (1) add a test that uses a filename
containing Unicode characters (e.g., 'café/Δ.txt') and asserts that the
PutObjectCommand Key equals the original Unicode string when cleanKey is false;
(2) add a test for the all-removed-input edge case where the original filename
would be sanitized to an empty string (simulate by using a filename composed
only of characters your cleaner would remove) and assert that the adapter falls
back to the configured fallback name (or documented default) as the
PutObjectCommand Key. Ensure both tests set cleanKey: false for the Unicode case
and the relevant cleanKey behavior for the fallback case, reuse s3ClientMock
setup and assertions style (call counts and instance checks) consistent with
neighboring tests.
| Actually this can't be done without #242 | 
Adds an automatic cleaning of S3 object keys to ensure they are safe and compliant with S3 guidelines. With optional addition of a
cleanKeyflag, which, when enabled, sanitizes filenames (or generated keys) before storing files in S3.(Requires parse-community/parse-server#9557 due to requirement that the revised s3 filepath is returned and saved)
Key changes:
S3 Key Cleaning Feature:
cleanS3Keyfunction that sanitizes S3 object keys by replacing or removing problematic characters, whitespace, and non-printable characters, ensuring compliance with S3 key guidelines.S3Adapterclass to accept acleanKeyoption and apply the cleaning logic to filenames or generated keys before uploading files to S3. The cleaning is applied after any custom key generation. [1] [2]Testing Enhancements:
cleanKeyoption is enabled, including cases for both direct filenames and generated keys. Also added a test to ensure that keys are not cleaned when the option is disabled.Summary by CodeRabbit