Skip to content

Commit 7f595b7

Browse files
committed
feat: improve Sanitizer README in EN and PT-BR
- Revised and enhanced descriptions in both English and Portuguese README. - Expanded usage examples for better clarity. - Added more details about sanitizer configuration options. - Updated integration and registry explanation sections.
1 parent 7c081ea commit 7f595b7

File tree

2 files changed

+319
-290
lines changed

2 files changed

+319
-290
lines changed

README.md

Lines changed: 135 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,5 @@
11
# KaririCode Framework: Sanitizer Component
22

3-
[![en](https://img.shields.io/badge/lang-en-red.svg)](README.md) [![pt-br](https://img.shields.io/badge/lang-pt--br-green.svg)](README.pt-br.md)
4-
5-
![PHP](https://img.shields.io/badge/PHP-777BB4?style=for-the-badge&logo=php&logoColor=white) ![Docker](https://img.shields.io/badge/Docker-2496ED?style=for-the-badge&logo=docker&logoColor=white) ![PHPUnit](https://img.shields.io/badge/PHPUnit-3776AB?style=for-the-badge&logo=php&logoColor=white)
6-
73
A robust and flexible data sanitization component for PHP, part of the KaririCode Framework. It utilizes configurable processors and native functions to ensure data integrity and security in your applications.
84

95
## Table of Contents
@@ -14,6 +10,9 @@ A robust and flexible data sanitization component for PHP, part of the KaririCod
1410
- [Basic Usage](#basic-usage)
1511
- [Advanced Usage: Blog Post Sanitization](#advanced-usage-blog-post-sanitization)
1612
- [Available Sanitizers](#available-sanitizers)
13+
- [Input Sanitizers](#input-sanitizers)
14+
- [Domain Sanitizers](#domain-sanitizers)
15+
- [Security Sanitizers](#security-sanitizers)
1716
- [Configuration](#configuration)
1817
- [Integration with Other KaririCode Components](#integration-with-other-kariricode-components)
1918
- [Development and Testing](#development-and-testing)
@@ -30,6 +29,9 @@ A robust and flexible data sanitization component for PHP, part of the KaririCod
3029
- Support for fallback values in case of sanitization failures
3130
- Extensible architecture allowing custom sanitizers
3231
- Robust error handling and reporting
32+
- Chainable sanitization pipelines for complex data transformations
33+
- Built-in support for multiple character encodings
34+
- Protection against XSS and SQL injection attacks
3335

3436
## Installation
3537

@@ -43,6 +45,7 @@ composer require kariricode/sanitizer
4345

4446
- PHP 8.3 or higher
4547
- Composer
48+
- Extensions: `ext-mbstring`, `ext-dom`, `ext-libxml`
4649

4750
## Usage
4851

@@ -58,7 +61,7 @@ class UserProfile
5861
#[Sanitize(processors: ['trim', 'html_special_chars'])]
5962
private string $name = '';
6063

61-
#[Sanitize(processors: ['trim', 'normalize_line_breaks'])]
64+
#[Sanitize(processors: ['trim', 'email_sanitizer'])]
6265
private string $email = '';
6366

6467
// Getters and setters...
@@ -72,51 +75,31 @@ use KaririCode\ProcessorPipeline\ProcessorRegistry;
7275
use KaririCode\Sanitizer\Sanitizer;
7376
use KaririCode\Sanitizer\Processor\Input\TrimSanitizer;
7477
use KaririCode\Sanitizer\Processor\Input\HtmlSpecialCharsSanitizer;
75-
use KaririCode\Sanitizer\Processor\Input\NormalizeLineBreaksSanitizer;
78+
use KaririCode\Sanitizer\Processor\Input\EmailSanitizer;
7679

7780
$registry = new ProcessorRegistry();
7881
$registry->register('sanitizer', 'trim', new TrimSanitizer());
7982
$registry->register('sanitizer', 'html_special_chars', new HtmlSpecialCharsSanitizer());
80-
$registry->register('sanitizer', 'normalize_line_breaks', new NormalizeLineBreaksSanitizer());
83+
$registry->register('sanitizer', 'email_sanitizer', new EmailSanitizer());
8184

8285
$sanitizer = new Sanitizer($registry);
8386

8487
$userProfile = new UserProfile();
85-
$userProfile->setName(" John Doe ");
86-
$userProfile->setEmail("john.doe@example.com\r\n");
88+
$userProfile->setName(" Walmir Silva <script>alert('xss')</script> ");
89+
$userProfile->setEmail(" walmir.silva@gmail.con ");
8790

8891
$result = $sanitizer->sanitize($userProfile);
8992

90-
echo $userProfile->getName(); // Output: "John Doe"
91-
echo $userProfile->getEmail(); // Output: "john.doe@example.com\n"
92-
93-
// Access sanitization results
94-
print_r($result['sanitizedValues']);
95-
print_r($result['messages']);
96-
print_r($result['errors']);
93+
echo $userProfile->getName(); // Output: "Walmir Silva"
94+
echo $userProfile->getEmail(); // Output: "walmir.silva@gmail.com"
9795
```
9896

9997
### Advanced Usage: Blog Post Sanitization
10098

101-
Here's a more comprehensive example demonstrating how to use the KaririCode Sanitizer in a real-world scenario, such as sanitizing blog post content:
99+
Here's an example of how to use the KaririCode Sanitizer in a real-world scenario, such as sanitizing blog post content:
102100

103101
```php
104-
<?php
105-
106-
declare(strict_types=1);
107-
108-
require_once __DIR__ . '/../vendor/autoload.php';
109-
110-
use KaririCode\ProcessorPipeline\ProcessorRegistry;
111102
use KaririCode\Sanitizer\Attribute\Sanitize;
112-
use KaririCode\Sanitizer\Processor\Domain\HtmlPurifierSanitizer;
113-
use KaririCode\Sanitizer\Processor\Domain\MarkdownSanitizer;
114-
use KaririCode\Sanitizer\Processor\Input\HtmlSpecialCharsSanitizer;
115-
use KaririCode\Sanitizer\Processor\Input\NormalizeLineBreaksSanitizer;
116-
use KaririCode\Sanitizer\Processor\Input\StripTagsSanitizer;
117-
use KaririCode\Sanitizer\Processor\Input\TrimSanitizer;
118-
use KaririCode\Sanitizer\Processor\Security\XssSanitizer;
119-
use KaririCode\Sanitizer\Sanitizer;
120103

121104
class BlogPost
122105
{
@@ -130,15 +113,6 @@ class BlogPost
130113
)]
131114
private string $title = '';
132115

133-
#[Sanitize(
134-
processors: ['trim', 'normalize_line_breaks'],
135-
messages: [
136-
'trim' => 'Slug was trimmed',
137-
'normalize_line_breaks' => 'Line breaks in slug were normalized',
138-
]
139-
)]
140-
private string $slug = '';
141-
142116
#[Sanitize(
143117
processors: ['trim', 'markdown', 'html_purifier'],
144118
messages: [
@@ -149,99 +123,128 @@ class BlogPost
149123
)]
150124
private string $content = '';
151125

152-
#[Sanitize(
153-
processors: ['trim', 'strip_tags', 'html_special_chars'],
154-
messages: [
155-
'trim' => 'Author name was trimmed',
156-
'strip_tags' => 'HTML tags were removed from author name',
157-
'html_special_chars' => 'Special characters in author name were escaped',
158-
]
159-
)]
160-
private string $authorName = '';
161-
162126
// Getters and setters...
163127
}
164128

165-
// Set up the sanitizer
166-
$registry = new ProcessorRegistry();
167-
$registry->register('sanitizer', 'trim', new TrimSanitizer());
168-
$registry->register('sanitizer', 'html_special_chars', new HtmlSpecialCharsSanitizer());
169-
$registry->register('sanitizer', 'normalize_line_breaks', new NormalizeLineBreaksSanitizer());
170-
$registry->register('sanitizer', 'strip_tags', new StripTagsSanitizer());
171-
$registry->register('sanitizer', 'markdown', new MarkdownSanitizer());
172-
$registry->register('sanitizer', 'xss_sanitizer', new XssSanitizer());
129+
// Usage example
130+
$blogPost = new BlogPost();
131+
$blogPost->setTitle(" Exploring KaririCode: A Modern PHP Framework <script>alert('xss')</script> ");
132+
$blogPost->setContent("# Introduction\nKaririCode is a **powerful** and _flexible_ PHP framework designed for modern web development.");
173133

174-
// Configure HTML Purifier with specific settings for blog content
175-
$htmlPurifier = new HtmlPurifierSanitizer();
176-
$htmlPurifier->configure([
177-
'allowedTags' => ['p', 'br', 'strong', 'em', 'u', 'ol', 'ul', 'li', 'a', 'img', 'h2', 'h3', 'blockquote'],
178-
'allowedAttributes' => ['href' => ['a'], 'src' => ['img'], 'alt' => ['img']],
179-
]);
180-
$registry->register('sanitizer', 'html_purifier', $htmlPurifier);
134+
$result = $sanitizer->sanitize($blogPost);
181135

182-
$sanitizer = new Sanitizer($registry);
136+
// Access sanitized data
137+
echo $blogPost->getTitle(); // Sanitized title
138+
echo $blogPost->getContent(); // Sanitized content
139+
```
183140

184-
// Simulating form submission with potentially unsafe data
185-
$blogPost = new BlogPost();
186-
$blogPost->setTitle(" Exploring KaririCode: A Modern PHP Framework <script>alert('xss')</script> ");
187-
$blogPost->setSlug(" exploring-kariricode-a-modern-php-framework \r\n");
188-
$blogPost->setContent("
189-
# Introduction
141+
## Available Sanitizers
190142

191-
KaririCode is a **powerful** and _flexible_ PHP framework designed for modern web development.
143+
### Input Sanitizers
192144

193-
<script>alert('malicious code');</script>
145+
- **TrimSanitizer**: Removes whitespace from the beginning and end of a string.
194146

195-
## Key Features
147+
- **Configuration Options**:
148+
- `characterMask`: Specifies which characters to trim. Default is whitespace.
149+
- `trimLeft`: Boolean to trim from the left side. Default is `true`.
150+
- `trimRight`: Boolean to trim from the right side. Default is `true`.
196151

197-
1. Robust sanitization
198-
2. Efficient routing
199-
3. Powerful ORM
152+
- **HtmlSpecialCharsSanitizer**: Converts special characters to HTML entities to prevent XSS attacks.
200153

201-
Check out our [official website](https://kariricode.org) for more information!
154+
- **Configuration Options**:
155+
- `flags`: Configurable flags like `ENT_QUOTES | ENT_HTML5`.
156+
- `encoding`: Character encoding, e.g., 'UTF-8'.
157+
- `doubleEncode`: Boolean to prevent double encoding. Default is `true`.
202158

203-
<img src=\"harmful.jpg\" onerror=\"alert('xss')\" />
204-
");
205-
$blogPost->setAuthorName("<b>John Doe</b> <script>alert('xss')</script>");
159+
- **NormalizeLineBreaksSanitizer**: Standardizes line breaks across different operating systems.
206160

207-
$result = $sanitizer->sanitize($blogPost);
161+
- **Configuration Options**:
162+
- `lineEnding`: Specifies line ending style. Options: 'unix', 'windows', 'mac'.
208163

209-
// Access sanitized data
210-
echo $blogPost->getTitle(); // Sanitized title
211-
echo $blogPost->getContent(); // Sanitized content
164+
- **EmailSanitizer**: Validates and corrects common email typos, normalizes email format, and handles whitespace.
212165

213-
// Access sanitization details
214-
print_r($result['sanitizedValues']);
215-
print_r($result['messages']);
216-
print_r($result['errors']);
217-
```
166+
- **Configuration Options**:
167+
- `removeMailtoPrefix`: Boolean to remove 'mailto:' prefix. Default is `false`.
168+
- `typoReplacements`: Associative array of common typo replacements.
169+
- `domainReplacements`: Corrects commonly misspelled domain names.
218170

219-
This example demonstrates how to use the KaririCode Sanitizer to clean and secure blog post data, including handling of Markdown content, HTML purification, and protection against XSS attacks.
171+
- **PhoneSanitizer**: Formats and validates phone numbers, including international support and custom formatting options.
220172

221-
## Available Sanitizers
173+
- **Configuration Options**:
174+
- `applyFormat`: Boolean to apply formatting. Default is `false`.
175+
- `format`: Custom format pattern for phone numbers.
176+
- `placeholder`: Placeholder character used in formatting.
222177

223-
The Sanitizer component provides various built-in sanitizers:
178+
- **AlphanumericSanitizer**: Removes non-alphanumeric characters, with configurable options to allow certain special characters.
224179

225-
### Input Sanitizers
180+
- **Configuration Options**:
181+
- `allowSpace`, `allowUnderscore`, `allowDash`, `allowDot`: Boolean options to allow specific characters.
182+
- `preserveCase`: Boolean to maintain case sensitivity.
183+
184+
- **UrlSanitizer**: Validates and normalizes URLs, ensuring proper protocol and structure.
185+
186+
- **Configuration Options**:
187+
- `enforceProtocol`: Enforces a specific protocol, e.g., 'https://'.
188+
- `defaultProtocol`: The protocol to apply if none is present.
189+
- `removeTrailingSlash`: Boolean to remove trailing slash.
226190

227-
- TrimSanitizer: Removes whitespace from the beginning and end of a string
228-
- HtmlSpecialCharsSanitizer: Converts special characters to HTML entities
229-
- NormalizeLineBreaksSanitizer: Standardizes line breaks across different operating systems
230-
- StripTagsSanitizer: Removes HTML and PHP tags from a string
191+
- **NumericSanitizer**: Ensures that the input is a numeric value, with options for decimal and negative numbers.
192+
193+
- **Configuration Options**:
194+
- `allowDecimal`, `allowNegative`: Boolean options to allow decimals and negative values.
195+
- `decimalSeparator`: Specifies the character used for decimals.
196+
197+
- **StripTagsSanitizer**: Removes HTML and PHP tags from input, with configurable options for allowed tags.
198+
- **Configuration Options**:
199+
- `allowedTags`: List of HTML tags to keep.
200+
- `keepSafeAttributes`: Boolean to keep certain safe attributes.
201+
- `safeAttributes`: Array of attributes to preserve.
231202

232203
### Domain Sanitizers
233204

234-
- HtmlPurifierSanitizer: Sanitizes HTML content using the HTML Purifier library
235-
- JsonSanitizer: Validates and prettifies JSON strings
236-
- MarkdownSanitizer: Sanitizes Markdown content
205+
- **HtmlPurifierSanitizer**: Sanitizes HTML content by removing unsafe tags and attributes, ensuring safe HTML rendering.
206+
207+
- **Configuration Options**:
208+
- `allowedTags`: Specifies which tags are allowed.
209+
- `allowedAttributes`: Defines allowed attributes for each tag.
210+
- `removeEmptyTags`, `removeComments`: Boolean to remove empty tags or HTML comments.
211+
- `htmlEntities`: Convert characters to HTML entities. Default is `true`.
212+
213+
- **JsonSanitizer**: Validates and prettifies JSON strings, removes invalid characters, and ensures proper JSON structure.
214+
215+
- **Configuration Options**:
216+
- `prettyPrint`: Boolean to format JSON for readability.
217+
- `removeInvalidCharacters`: Boolean to remove invalid characters from JSON.
218+
- `validateUnicode`: Boolean to validate Unicode characters.
219+
220+
- **MarkdownSanitizer**: Processes and sanitizes Markdown content, escaping special characters and preserving the Markdown structure.
221+
- **Configuration Options**:
222+
- `allowedElements`: Specifies allowed Markdown elements (e.g., 'p', 'h1', 'a').
223+
- `escapeSpecialCharacters`: Boolean to escape special characters like '\*', '\_', etc.
224+
- `preserveStructure`: Boolean to maintain Markdown formatting.
237225

238226
### Security Sanitizers
239227

240-
- FilenameSanitizer: Ensures filenames are safe for use in file systems
241-
- SqlInjectionSanitizer: Protects against SQL injection attacks
242-
- XssSanitizer: Prevents Cross-Site Scripting (XSS) attacks
228+
- **FilenameSanitizer**: Ensures filenames are safe for use in file systems by removing unsafe characters and validating extensions.
229+
230+
- **Configuration Options**:
231+
- `replacement`: Character used to replace unsafe characters. Default is `'-'`.
232+
- `preserveExtension`: Boolean to keep the file extension.
233+
- `blockDangerousExtensions`: Boolean to block extensions like '.exe', '.js'.
234+
- `allowedExtensions`: Array of allowed extensions.
235+
236+
- **SqlInjectionSanitizer**: Protects against SQL injection attacks by escaping special characters and removing potentially harmful content.
243237

244-
For detailed information on each sanitizer, including configuration options and usage examples, please refer to the [documentation](https://kariricode.org/docs/sanitizer).
238+
- **Configuration Options**:
239+
- `escapeMap`: Array of characters to escape.
240+
- `removeComments`: Boolean to strip SQL comments.
241+
- `escapeQuotes`: Boolean to escape quotes in SQL queries.
242+
243+
- **XssSanitizer**: Prevents Cross-Site Scripting (XSS) attacks by removing malicious scripts, attributes, and ensuring safe HTML output.
244+
- **Configuration Options**:
245+
- `removeScripts`: Boolean to remove `<script>` tags.
246+
- `removeEventHandlers`: Boolean to remove 'on\*' event handlers.
247+
- `encodeHtmlEntities`: Boolean to encode unsafe characters.
245248

246249
## Configuration
247250

@@ -269,26 +272,36 @@ The Sanitizer component is designed to work seamlessly with other KaririCode com
269272
- **KaririCode\ProcessorPipeline**: Utilized for building and executing sanitization pipelines.
270273
- **KaririCode\PropertyInspector**: Used for analyzing and processing object properties with sanitization attributes.
271274

272-
Example of integration:
275+
## Registry Explanation
273276

274-
```php
275-
use KaririCode\ProcessorPipeline\ProcessorRegistry;
276-
use KaririCode\ProcessorPipeline\ProcessorBuilder;
277-
use KaririCode\PropertyInspector\AttributeAnalyzer;
278-
use KaririCode\PropertyInspector\AttributeHandler;
279-
use KaririCode\PropertyInspector\Utility\PropertyInspector;
280-
use KaririCode\Sanitizer\Sanitizer;
277+
The registry is a core part of how sanitizers are managed within the KaririCode Framework. It acts as a centralized location to register and configure all sanitizers you plan to use in your application.
281278

282-
$registry = new ProcessorRegistry();
283-
// Register sanitizers...
279+
Here's how you can create and configure the registry:
284280

285-
$builder = new ProcessorBuilder($registry);
286-
$attributeHandler = new AttributeHandler('sanitizer', $builder);
287-
$propertyInspector = new PropertyInspector(new AttributeAnalyzer(Sanitize::class));
281+
```php
282+
// Create and configure the registry
283+
$registry = new ProcessorRegistry();
288284

289-
$sanitizer = new Sanitizer($registry);
285+
// Register all required processors
286+
$registry->register('sanitizer', 'trim', new TrimSanitizer());
287+
$registry->register('sanitizer', 'html_special_chars', new HtmlSpecialCharsSanitizer());
288+
$registry->register('sanitizer', 'normalize_line_breaks', new NormalizeLineBreaksSanitizer());
289+
$registry->register('sanitizer', 'html_purifier', new HtmlPurifierSanitizer());
290+
$registry->register('sanitizer', 'markdown', new MarkdownSanitizer());
291+
$registry->register('sanitizer', 'numeric_sanitizer', new NumericSanitizer());
292+
$registry->register('sanitizer', 'email_sanitizer', new EmailSanitizer());
293+
$registry->register('sanitizer', 'phone_sanitizer', new PhoneSanitizer());
294+
$registry->register('sanitizer', 'url_sanitizer', new UrlSanitizer());
295+
$registry->register('sanitizer', 'alphanumeric_sanitizer', new AlphanumericSanitizer());
296+
$registry->register('sanitizer', 'filename_sanitizer', new FilenameSanitizer());
297+
$registry->register('sanitizer', 'json_sanitizer', new JsonSanitizer());
298+
$registry->register('sanitizer', 'xss_sanitizer', new XssSanitizer());
299+
$registry->register('sanitizer', 'sql_injection', new SqlInjectionSanitizer());
300+
$registry->register('sanitizer', 'strip_tags', new StripTagsSanitizer());
290301
```
291302

303+
This code demonstrates how to register various sanitizers with the registry, allowing you to easily manage which sanitizers are available throughout your application. Each sanitizer is given a unique identifier, which can then be referenced in attributes to apply specific sanitization rules.
304+
292305
## Development and Testing
293306

294307
For development and testing purposes, this package uses Docker and Docker Compose to ensure consistency across different environments. A Makefile is provided for convenience.
@@ -321,6 +334,7 @@ For development and testing purposes, this package uses Docker and Docker Compos
321334
```
322335

323336
4. Install dependencies:
337+
324338
```bash
325339
make composer-install
326340
```

0 commit comments

Comments
 (0)