> ## Documentation Index
> Fetch the complete documentation index at: https://docs.go-mizu.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Sanitizer

> Input sanitization middleware for security and data cleaning.

## Overview

The `sanitizer` middleware sanitizes request data to prevent XSS, SQL injection, and other injection attacks.

Use it when you need:

* XSS prevention
* Input cleaning
* Data normalization

## Installation

```go theme={null}
import "github.com/go-mizu/mizu/middlewares/sanitizer"
```

## Quick Start

```go theme={null}
app := mizu.New()

// Sanitize all string inputs
app.Use(sanitizer.New())
```

## Configuration

### Options

| Option         | Type       | Default | Description        |
| -------------- | ---------- | ------- | ------------------ |
| `StripHTML`    | `bool`     | `true`  | Remove HTML tags   |
| `TrimSpace`    | `bool`     | `true`  | Trim whitespace    |
| `StripScripts` | `bool`     | `true`  | Remove script tags |
| `EscapeHTML`   | `bool`     | `false` | HTML escape        |
| `Fields`       | `[]string` | All     | Specific fields    |

## Examples

### Default Sanitization

```go theme={null}
app.Use(sanitizer.New())
```

### Strip HTML

```go theme={null}
app.Use(sanitizer.WithOptions(sanitizer.Options{
    StripHTML: true,
}))
```

### Escape Instead of Strip

```go theme={null}
app.Use(sanitizer.WithOptions(sanitizer.Options{
    StripHTML:  false,
    EscapeHTML: true,
}))
```

### Specific Fields

```go theme={null}
app.Use(sanitizer.WithOptions(sanitizer.Options{
    Fields: []string{"name", "email", "comment"},
}))
```

### Custom Sanitizer

```go theme={null}
app.Use(sanitizer.WithOptions(sanitizer.Options{
    Custom: func(field string, value string) string {
        // Custom sanitization logic
        return strings.TrimSpace(value)
    },
}))
```

## API Reference

### Functions

```go theme={null}
// New creates sanitizer middleware
func New() mizu.Middleware

// WithOptions creates with configuration
func WithOptions(opts Options) mizu.Middleware

// SanitizeString sanitizes a single string
func SanitizeString(s string, opts Options) string
```

## What Gets Sanitized

* Query parameters
* Form data
* JSON body fields
* Path parameters

## Technical Details

The sanitizer middleware operates by intercepting HTTP requests and applying configurable sanitization rules to input data before it reaches your handlers.

### Implementation Overview

The middleware processes request data in the following order:

1. **Query Parameters**: Sanitizes all URL query parameters
2. **Form Data**: For POST/PUT/PATCH requests, sanitizes both `r.Form` and `r.PostForm` fields
3. **Field Filtering**: Applies whitelist (`Fields`) or blacklist (`Exclude`) filtering

### Sanitization Pipeline

Each value passes through a configurable pipeline of operations:

1. **Trim Spaces** (`TrimSpaces`): Removes leading and trailing whitespace using `strings.TrimSpace`
2. **Strip Non-Printable** (`StripNonPrintable`): Removes non-printable characters while preserving newlines, carriage returns, and tabs
3. **Strip Tags** (`StripTags`): Removes HTML tags using regex-based matching:
   * First removes `<script>` and `<style>` tags with their contents
   * Then removes all remaining HTML tags
4. **HTML Escape** (`HTMLEscape`): Converts special characters to HTML entities using `html.EscapeString`
5. **Max Length** (`MaxLength`): Truncates values exceeding the specified length

### Key Functions

* `shouldSanitize()`: Determines if a field should be sanitized based on Fields and Exclude lists
* `sanitizeValue()`: Applies the sanitization pipeline to a single value
* `stripNonPrintable()`: Uses `unicode.IsPrint()` to filter characters
* `stripTags()`: Uses compiled regex patterns for efficient HTML tag removal

### Performance Considerations

* Field and exclude maps are pre-built at middleware initialization for O(1) lookups
* Regex patterns for tag stripping are compiled once and reused
* The middleware modifies request objects in-place to avoid allocations

## Best Practices

* Use as defense in depth
* Don't rely solely on sanitization
* Use parameterized queries for SQL
* Use proper output encoding

## Testing

### Test Coverage

The sanitizer middleware includes comprehensive test cases covering all functionality:

| Test Case                           | Description                                       | Expected Behavior                                                                                              |
| ----------------------------------- | ------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| `TestNew`                           | Default middleware with XSS input                 | HTML entities escaped (e.g., `<script>` becomes `&lt;script&gt;`)                                              |
| `TestWithOptions_TrimSpaces`        | TrimSpaces option with padded input               | Leading and trailing spaces removed (`"  John  "` becomes `"John"`)                                            |
| `TestWithOptions_StripTags`         | StripTags option with HTML content                | All HTML tags removed including script tags (`"<p>Hello</p><script>bad</script>World"` becomes `"HelloWorld"`) |
| `TestWithOptions_MaxLength`         | MaxLength option with long input                  | Value truncated to specified length (`"VeryLongName"` becomes `"VeryL"` with MaxLength=5)                      |
| `TestWithOptions_Fields`            | Fields whitelist with multiple parameters         | Only specified fields sanitized, others passed through unchanged                                               |
| `TestWithOptions_Exclude`           | Exclude blacklist with multiple parameters        | Excluded fields bypass sanitization, others are sanitized                                                      |
| `TestXSS`                           | XSS prevention preset                             | Script tags escaped to prevent XSS attacks                                                                     |
| `TestStripHTML`                     | HTML stripping preset                             | All HTML tags removed from input                                                                               |
| `TestTrim`                          | Trim whitespace preset                            | Leading and trailing whitespace removed                                                                        |
| `TestSanitize`                      | Direct sanitization function with various options | Correct sanitization applied: HTML escape, trim, strip tags, max length                                        |
| `TestSanitizeHTML`                  | HTML sanitization helper                          | HTML escaped and trimmed                                                                                       |
| `TestStripTagsString`               | Tag stripping helper                              | HTML tags removed from string                                                                                  |
| `TestTrimString`                    | Trim helper function                              | Whitespace trimmed from string                                                                                 |
| `TestClean`                         | All-in-one cleaning function                      | All sanitization operations applied (HTML escape, trim, strip tags, strip non-printable)                       |
| `TestWithOptions_StripNonPrintable` | Non-printable character removal                   | Control characters removed (`"hello\x00world\x1f"` becomes `"helloworld"`)                                     |

## Related Middlewares

* [validator](/middlewares/validator) - Input validation
* [bodylimit](/middlewares/bodylimit) - Size limits
* [helmet](/middlewares/helmet) - Security headers
