Best Practices

Anonymizer replaces sensitive PII data with fake values while maintaining data consistency and referential integrity, to create a working copy for experimentation and testing that does not violate PII laws.

Before anonymizing:

Back up your database - Anonymization is irreversible.
Test on a copy - Validate your configuration to ensure that Anonymizer is applied to a non-production database.
Review columns - Ensure all PII columns are included when obscuring test data.
Check foreign keys - Understand CASCADE relationships within your tables.

To maintain a secure environment while using Anonymizer, you should:

run Anonymizer with minimal required privileges.
use SSL for production database connections.
secure your configuration file (the file contains connection details).
consider running Anonymizer on the database server to avoid network transfer of sensitive data.

When anonymizing large databases, Anonymizer improves performance by using:

Server-side cursors: Rows are fetched in configurable batches (default 10,000).
Batch updates: Multiple rows updated in single statements using CTID-based unnest operations.
Tiered caching: LRU in-memory cache with SQLite spillover for value dictionaries.

To ensure you're getting the best performance, you should:

anonymize content during low-traffic periods.
ensure that you have adequate disk space for transaction logs.
consider using table-level locks for very large updates.
monitor the PostgreSQL logs for any issues.