Privacy Enhancing Technologies – Introduction

4 Min Read

Updated on February 28, 2026

Every time you browse a website, click an ad, make a purchase, or train an ML model, data flows through systems. Companies need this data — for analytics, measurement, personalization, and product improvement. But they also have legal, ethical, and business obligations to protect privacy.

This creates a fundamental tension:

How do we extract value from data while minimizing who can see what, when, and for what purpose?

Privacy Enhancing Technologies (PETs) are the technical toolkit that resolves this tension. They’re neither a magic bullet nor a single solution — they’re a layered system of techniques, each addressing a different phase of the data lifecycle.

The Data Lifecycle: Where Privacy Attacks Happen

Data moves through distinct phases, and each phase has different privacy risks. No single technology solves all phases.

Phase	Risk	PETs That Help
At collection	Collecting more than needed	Data Minimization Purpose Limitation
At rest	Identifiers exposed in storage	Storage Anonymization Encryption (AES)
In transit	Data intercepted on network	TLS, Diffie-Hellman
In use / compute	Data exposed during processing	TEE, CVM (Trust Chip Maker) MPC (Trust Math) Data Clean Rooms
At output	Query results leak individuals	Query Anonymization Differential Privacy
In measurement	Attribution reveals behavior	Sales Lift (in DCR) Entropy Balancing
In ML	Models memorize training data	PATE
Across partners	Identity linked across systems	Crosswalks Private Set Intersection (PSI) (privacy-safe mapping)

The art of privacy engineering is layering the right techniques for each phase of your data’s journey.

* One interesting twist in the tale is that in some cases, DP can be applied at source also (input). Apple and Google insert some noise on-device especially when collecting telemetry.

The Three Layers of Privacy Protection

This guide is organized into three parts, each covering a distinct layer.

Part 1: Data Protection Fundamentals

What happens to your data inside a single organization.

These are the foundational techniques that every data system should implement:

Technique	What It Does	Phase
Data Minimization	Collect only what you need, delete when done	Collection
Storage Anonymization	Replace identifiers with pseudonyms	At rest
Query Anonymization	Enforce cohort thresholds (min. users agg. per row) on outputs	Output
Differential Privacy	Add mathematical noise guarantees	Output

Part 1 answers: “How do I protect data within my own systems?”

Read Part 1: Data Protection Fundamentals →

Part 2: Secure Collaboration and Infrastructure

How multiple organizations work together without sharing raw data.

Modern business requires collaboration — advertisers measuring campaigns with retailers, healthcare providers conducting joint research. These techniques enable collaboration while preserving privacy:

Technique	What It Does	Phase
Identity Mapping / Crosswalks / Private Set Intersection	Connect users across systems without sharing raw IDs	Across partners
Data Clean Rooms	Compute joint insights in a governed environment	In use / compute
Purpose Limitation	Bind data access to declared intent	Collection + Use
TEE / CVM	Hardware-isolated computation— even admins can’t see inside	In use / compute
Diffie-Hellman + AES	Secure key exchange and encryption in transit	In transit

Part 2 answers: “How do I collaborate with partners without exposing raw data?”

Read Part 2: Secure Collaboration and Infrastructure →

Part 3: Privacy-Preserving Computation and Measurement

How to compute, train ML, and measure without revealing inputs.

The most advanced layer: performing computation across parties where no single party sees the combined data, training ML models on sensitive data, and measuring business outcomes within privacy constraints:

Technique	What It Does	Phase
Multi-Party Computation (MPC)	Joint computation without revealing inputs	In use / compute
PATE	Train ML models with differential privacy	In ML
Sales Lift / Incrementality	Measure causal ad impact (operates within PET infrastructure)	In measurement
Entropy Balancing	Correct imperfect experiments (privacy-compatible)	In measurement

Part 3 answers: “How do I compute, train, and measure when even the computation itself must be private?”

Read Part 3: Privacy-Preserving Computation and Measurement →

How the Pieces Fit Together

A real-world privacy-preserving system might use all of these together. Consider an advertiser measuring campaign effectiveness with a retailer:

Data Minimization: Retailer collects only purchase amount + timestamp (not full basket). Advertiser collects only ad exposure.
Storage Anonymization: Both parties pseudonymize user IDs before any analysis.
Identity Mapping: Establish crosswalk via identity provider — hashed mappings only, or use Private Set Intersection with double hashing.
Data Clean Room: Both parties upload pseudonymized data to clean room. Clean room may run inside Confidential VM (TEE protection) for stronger privacy.
Purpose Limitation: Query declares purpose: “ads measurement”. System verifies both parties’ data allows this purpose.
Query Execution with Privacy: Approved aggregate query runs inside TEE. Differential privacy noise added to results. Query anonymization enforces cohort thresholds — and suppress any aggregated rows containing less users than the threshold .
Measurement: Sales lift computed (exposed vs. control). Entropy balancing corrects for any group imbalances.
Output: Only noisy aggregate result leaves clean room: “Campaign drove +12% incremental sales lift.” Neither party saw the other’s raw data.

Each layer catches what the previous layer missed. Together, they enable insights that would be impossible — or irresponsible — without privacy protection.

Quick Reference: Choosing the Right Technique

If you need to…	Use…	Covered in…
Reduce data collection footprint	Data Minimization	Part 1
Protect identifiers in storage	Storage Anonymization	Part 1
Prevent queries from exposing individuals	Query Anonymization	Part 1
Add mathematical privacy guarantees	Differential Privacy	Part 1
Connect users across partner systems	Crosswalks / PSI for ID Mapping	Part 2
Compute joint insights with partners	Data Clean Rooms	Part 2
Bind data to declared purpose	Purpose Limitation	Part 2
Protect data even from cloud admins	TEE / CVM	Part 2
Compute without any party seeing inputs	MPC	Part 3
Train ML on sensitive data privately	PATE	Part 3
Measure causal ad impact	Sales Lift (in DCR)	Part 3
Correct imperfect experiment groups	Entropy Balancing	Part 3

The Core Principle

Every technique in this guide exists to answer one question:

How do we extract value from data while minimizing who can see what, when, and for what purpose?

The answer is never a single technology. It’s a layered defense:

Minimize what you collect
Anonymize what you store
Protect what you compute
Add noise to what you output
Audit what you access

No layer is perfect. Each has trade-offs — flexibility, accuracy, speed, cost. The art is choosing the right combination for your use case, understanding what each layer protects, and being honest about residual risks.

Privacy engineering isn’t about perfection. It’s about thoughtful layering — building systems that extract value while genuinely protecting the individuals behind the data.

Tags:

ai artificial-intelligence chatgpt cybersecurity privacy-enhancing-technologies privacy-preserving-technologies technology

Breaking the “Unbreakable” Encryption – Part 2

Privacy Enhancing Technologies (PETs) — Part 2

Needle in the Haystack: Embedding Training and Context Rot

Privacy Enhancing Technologies (PETs) — Part 3

Making Sense Of Embeddings

How CNNs Actually Work

Privacy Enhancing Technologies – Introduction

The Data Lifecycle: Where Privacy Attacks Happen

The Three Layers of Privacy Protection

Part 1: Data Protection Fundamentals

Part 2: Secure Collaboration and Infrastructure

Part 3: Privacy-Preserving Computation and Measurement

How the Pieces Fit Together

Quick Reference: Choosing the Right Technique

The Core Principle

Start Reading

Part 1: Data Protection Fundamentals →

Part 2: Secure Collaboration and Infrastructure →

Part 3: Privacy-Preserving Computation and Measurement →

Related Posts:

Tags:

Archit Sharma

Other Articles

Privacy Enhancing Technologies (PETs) — Part 3

ML Foundations – Linear Combinations to Logistic Regression

No Comment! Be the first one.

Leave a Reply Cancel reply

ML Basics

Model Intuition

Encryption

Privacy Tech

Musings