Privacy Enhancing Technologies – Introduction
Every time you browse a website, click an ad, make a purchase, or train an ML model, data flows through systems. Companies need this data — for analytics, measurement, personalization, and product improvement. But they also have legal, ethical, and business obligations to protect privacy.
This creates a fundamental tension:
How do we extract value from data while minimizing who can see what, when, and for what purpose?
Privacy Enhancing Technologies (PETs) are the technical toolkit that resolves this tension. They’re neither a magic bullet nor a single solution — they’re a layered system of techniques, each addressing a different phase of the data lifecycle.
The Data Lifecycle: Where Privacy Attacks Happen
Data moves through distinct phases, and each phase has different privacy risks. No single technology solves all phases.
| Phase | Risk | PETs That Help |
| At collection | Collecting more than needed | Data Minimization Purpose Limitation |
| At rest | Identifiers exposed in storage | Storage Anonymization Encryption (AES) |
| In transit | Data intercepted on network | TLS, Diffie-Hellman |
| In use / compute | Data exposed during processing | TEE, CVM (Trust Chip Maker) MPC (Trust Math) Data Clean Rooms |
| At output | Query results leak individuals | Query Anonymization Differential Privacy |
| In measurement | Attribution reveals behavior | Sales Lift (in DCR) Entropy Balancing |
| In ML | Models memorize training data | PATE |
| Across partners | Identity linked across systems | Crosswalks Private Set Intersection (PSI) (privacy-safe mapping) |
The art of privacy engineering is layering the right techniques for each phase of your data’s journey.
* One interesting twist in the tale is that in some cases, DP can be applied at source also (input). Apple and Google insert some noise on-device especially when collecting telemetry.
The Three Layers of Privacy Protection
This guide is organized into three parts, each covering a distinct layer.
Part 1: Data Protection Fundamentals
What happens to your data inside a single organization.
These are the foundational techniques that every data system should implement:
| Technique | What It Does | Phase |
| Data Minimization | Collect only what you need, delete when done | Collection |
| Storage Anonymization | Replace identifiers with pseudonyms | At rest |
| Query Anonymization | Enforce cohort thresholds (min. users agg. per row) on outputs | Output |
| Differential Privacy | Add mathematical noise guarantees | Output |
Part 1 answers: “How do I protect data within my own systems?”
Read Part 1: Data Protection Fundamentals →
Part 2: Secure Collaboration and Infrastructure
How multiple organizations work together without sharing raw data.
Modern business requires collaboration — advertisers measuring campaigns with retailers, healthcare providers conducting joint research. These techniques enable collaboration while preserving privacy:
| Technique | What It Does | Phase |
| Identity Mapping / Crosswalks / Private Set Intersection | Connect users across systems without sharing raw IDs | Across partners |
| Data Clean Rooms | Compute joint insights in a governed environment | In use / compute |
| Purpose Limitation | Bind data access to declared intent | Collection + Use |
| TEE / CVM | Hardware-isolated computation— even admins can’t see inside | In use / compute |
| Diffie-Hellman + AES | Secure key exchange and encryption in transit | In transit |
Part 2 answers: “How do I collaborate with partners without exposing raw data?”
Read Part 2: Secure Collaboration and Infrastructure →
Part 3: Privacy-Preserving Computation and Measurement
How to compute, train ML, and measure without revealing inputs.
The most advanced layer: performing computation across parties where no single party sees the combined data, training ML models on sensitive data, and measuring business outcomes within privacy constraints:
| Technique | What It Does | Phase |
| Multi-Party Computation (MPC) | Joint computation without revealing inputs | In use / compute |
| PATE | Train ML models with differential privacy | In ML |
| Sales Lift / Incrementality | Measure causal ad impact (operates within PET infrastructure) | In measurement |
| Entropy Balancing | Correct imperfect experiments (privacy-compatible) | In measurement |
Part 3 answers: “How do I compute, train, and measure when even the computation itself must be private?”
Read Part 3: Privacy-Preserving Computation and Measurement →
How the Pieces Fit Together
A real-world privacy-preserving system might use all of these together. Consider an advertiser measuring campaign effectiveness with a retailer:
- Data Minimization: Retailer collects only purchase amount + timestamp (not full basket). Advertiser collects only ad exposure.
- Storage Anonymization: Both parties pseudonymize user IDs before any analysis.
- Identity Mapping: Establish crosswalk via identity provider — hashed mappings only, or use Private Set Intersection with double hashing.
- Data Clean Room: Both parties upload pseudonymized data to clean room. Clean room may run inside Confidential VM (TEE protection) for stronger privacy.
- Purpose Limitation: Query declares purpose: “ads measurement”. System verifies both parties’ data allows this purpose.
- Query Execution with Privacy: Approved aggregate query runs inside TEE. Differential privacy noise added to results. Query anonymization enforces cohort thresholds — and suppress any aggregated rows containing less users than the threshold .
- Measurement: Sales lift computed (exposed vs. control). Entropy balancing corrects for any group imbalances.
- Output: Only noisy aggregate result leaves clean room: “Campaign drove +12% incremental sales lift.” Neither party saw the other’s raw data.
Each layer catches what the previous layer missed. Together, they enable insights that would be impossible — or irresponsible — without privacy protection.
Quick Reference: Choosing the Right Technique
| If you need to… | Use… | Covered in… |
| Reduce data collection footprint | Data Minimization | Part 1 |
| Protect identifiers in storage | Storage Anonymization | Part 1 |
| Prevent queries from exposing individuals | Query Anonymization | Part 1 |
| Add mathematical privacy guarantees | Differential Privacy | Part 1 |
| Connect users across partner systems | Crosswalks / PSI for ID Mapping | Part 2 |
| Compute joint insights with partners | Data Clean Rooms | Part 2 |
| Bind data to declared purpose | Purpose Limitation | Part 2 |
| Protect data even from cloud admins | TEE / CVM | Part 2 |
| Compute without any party seeing inputs | MPC | Part 3 |
| Train ML on sensitive data privately | PATE | Part 3 |
| Measure causal ad impact | Sales Lift (in DCR) | Part 3 |
| Correct imperfect experiment groups | Entropy Balancing | Part 3 |
The Core Principle
Every technique in this guide exists to answer one question:
How do we extract value from data while minimizing who can see what, when, and for what purpose?
The answer is never a single technology. It’s a layered defense:
- Minimize what you collect
- Anonymize what you store
- Protect what you compute
- Add noise to what you output
- Audit what you access
No layer is perfect. Each has trade-offs — flexibility, accuracy, speed, cost. The art is choosing the right combination for your use case, understanding what each layer protects, and being honest about residual risks.
Privacy engineering isn’t about perfection. It’s about thoughtful layering — building systems that extract value while genuinely protecting the individuals behind the data.
Start Reading
Part 1: Data Protection Fundamentals →
Data Minimization, Storage Anonymization, Query Anonymization, Differential Privacy
Part 2: Secure Collaboration and Infrastructure →
Identity Mapping, Data Clean Rooms, Purpose Limitation, TEE/CVM, Encryption
Part 3: Privacy-Preserving Computation and Measurement →
Multi-Party Computation, PATE, Sales Lift, Entropy Balancing