Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
March 1, 2026
Teaching AI Models: Gradient Descent
March 1, 2026
Needle in the Haystack: Embedding Training and Context Rot
March 1, 2026
Measuring Meaning: Cosine Similarity
February 28, 2026
AI Paradigm Shift: From Rules to Patterns
February 16, 2026
Seq2Seq Models: Basics behind LLMs
February 16, 2026
Word2Vec: Start of Dense Embeddings
February 13, 2026
Advertising in the Age of AI
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 2
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 1
February 8, 2026
ML Foundations – Linear Combinations to Logistic Regression
February 2, 2026
Privacy Enhancing Technologies – Introduction
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 2
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 1
February 2, 2026
An Intuitive Guide to CNNs and RNNs
February 2, 2026
Making Sense Of Embeddings
November 9, 2025
How CNNs Actually Work
August 17, 2025
How Smart Vector Search Works
Encryption

Breaking the “Unbreakable” Encryption – Part 2

In Part 1, we covered the “Safe” (Symmetric) and the “Mailbox” (Asymmetric). The TL;DR: we use…

Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 2

Secure Collaboration Without Sharing Raw Data In Part 1, we covered how individual organizations protect data…

Machine Learning Basics Model Intuition

Needle in the Haystack: Embedding Training and Context Rot

Post 2c/N You’ve probably experienced this: you paste a 50-page document into ChatGPT or Claude, ask a specific…

Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 3

Privacy-Preserving Computation and Measurement In Part 1, we covered how organizations protect data internally —…

Machine Learning Basics

Making Sense Of Embeddings

Post 2/N When you search on Amazon for “running shoes,” the system doesn’t just look for those exact…

Model Intuition

How CNNs Actually Work

In the ever-evolving world, the art of forging genuine connections remains timeless. Whether it’s with colleagues,…

Home/Privacy Tech/Privacy Enhancing Technologies – Introduction
Privacy Tech

Privacy Enhancing Technologies – Introduction

By Archit Sharma
4 Min Read
0
Updated on February 28, 2026

Every time you browse a website, click an ad, make a purchase, or train an ML model, data flows through systems. Companies need this data — for analytics, measurement, personalization, and product improvement. But they also have legal, ethical, and business obligations to protect privacy.

This creates a fundamental tension:

How do we extract value from data while minimizing who can see what, when, and for what purpose?

Privacy Enhancing Technologies (PETs) are the technical toolkit that resolves this tension. They’re neither a magic bullet nor a single solution — they’re a layered system of techniques, each addressing a different phase of the data lifecycle.


The Data Lifecycle: Where Privacy Attacks Happen

Data moves through distinct phases, and each phase has different privacy risks. No single technology solves all phases.

PhaseRiskPETs That Help
At collectionCollecting more than neededData Minimization
Purpose Limitation
At restIdentifiers exposed in storageStorage Anonymization
Encryption (AES)
In transitData intercepted on networkTLS, Diffie-Hellman
In use / computeData exposed during processingTEE, CVM (Trust Chip Maker)
MPC (Trust Math)
Data Clean Rooms
At outputQuery results leak individualsQuery Anonymization
Differential Privacy
In measurementAttribution reveals behaviorSales Lift (in DCR)
Entropy Balancing
In MLModels memorize training dataPATE
Across partnersIdentity linked across systemsCrosswalks
Private Set Intersection (PSI)
(privacy-safe mapping)

The art of privacy engineering is layering the right techniques for each phase of your data’s journey.

* One interesting twist in the tale is that in some cases, DP can be applied at source also (input). Apple and Google insert some noise on-device especially when collecting telemetry.


The Three Layers of Privacy Protection

This guide is organized into three parts, each covering a distinct layer.

Part 1: Data Protection Fundamentals

What happens to your data inside a single organization.

These are the foundational techniques that every data system should implement:

TechniqueWhat It DoesPhase
Data MinimizationCollect only what you need, delete when doneCollection
Storage AnonymizationReplace identifiers with pseudonymsAt rest
Query AnonymizationEnforce cohort thresholds (min. users agg. per row) on outputsOutput
Differential PrivacyAdd mathematical noise guaranteesOutput

Part 1 answers: “How do I protect data within my own systems?”

Read Part 1: Data Protection Fundamentals →


Part 2: Secure Collaboration and Infrastructure

How multiple organizations work together without sharing raw data.

Modern business requires collaboration — advertisers measuring campaigns with retailers, healthcare providers conducting joint research. These techniques enable collaboration while preserving privacy:

TechniqueWhat It DoesPhase
Identity Mapping / Crosswalks / Private Set IntersectionConnect users across systems without sharing raw IDsAcross partners
Data Clean RoomsCompute joint insights in a governed environmentIn use / compute
Purpose LimitationBind data access to declared intentCollection + Use
TEE / CVMHardware-isolated computation— even admins can’t see insideIn use / compute
Diffie-Hellman + AESSecure key exchange and encryption in transitIn transit

Part 2 answers: “How do I collaborate with partners without exposing raw data?”

Read Part 2: Secure Collaboration and Infrastructure →


Part 3: Privacy-Preserving Computation and Measurement

How to compute, train ML, and measure without revealing inputs.

The most advanced layer: performing computation across parties where no single party sees the combined data, training ML models on sensitive data, and measuring business outcomes within privacy constraints:

TechniqueWhat It DoesPhase
Multi-Party Computation (MPC)Joint computation without revealing inputsIn use / compute
PATETrain ML models with differential privacyIn ML
Sales Lift / IncrementalityMeasure causal ad impact (operates within PET infrastructure)In measurement
Entropy BalancingCorrect imperfect experiments (privacy-compatible)In measurement

Part 3 answers: “How do I compute, train, and measure when even the computation itself must be private?”

Read Part 3: Privacy-Preserving Computation and Measurement →


How the Pieces Fit Together

A real-world privacy-preserving system might use all of these together. Consider an advertiser measuring campaign effectiveness with a retailer:

  1. Data Minimization: Retailer collects only purchase amount + timestamp (not full basket). Advertiser collects only ad exposure.
  2. Storage Anonymization: Both parties pseudonymize user IDs before any analysis.
  3. Identity Mapping: Establish crosswalk via identity provider — hashed mappings only, or use Private Set Intersection with double hashing.
  4. Data Clean Room: Both parties upload pseudonymized data to clean room. Clean room may run inside Confidential VM (TEE protection) for stronger privacy.
  5. Purpose Limitation: Query declares purpose: “ads measurement”. System verifies both parties’ data allows this purpose.
  6. Query Execution with Privacy: Approved aggregate query runs inside TEE. Differential privacy noise added to results. Query anonymization enforces cohort thresholds — and suppress any aggregated rows containing less users than the threshold .
  7. Measurement: Sales lift computed (exposed vs. control). Entropy balancing corrects for any group imbalances.
  8. Output: Only noisy aggregate result leaves clean room: “Campaign drove +12% incremental sales lift.” Neither party saw the other’s raw data.

Each layer catches what the previous layer missed. Together, they enable insights that would be impossible — or irresponsible — without privacy protection.


Quick Reference: Choosing the Right Technique

If you need to…Use…Covered in…
Reduce data collection footprintData MinimizationPart 1
Protect identifiers in storageStorage AnonymizationPart 1
Prevent queries from exposing individualsQuery AnonymizationPart 1
Add mathematical privacy guaranteesDifferential PrivacyPart 1
Connect users across partner systemsCrosswalks / PSI for ID MappingPart 2
Compute joint insights with partnersData Clean RoomsPart 2
Bind data to declared purposePurpose LimitationPart 2
Protect data even from cloud adminsTEE / CVMPart 2
Compute without any party seeing inputsMPCPart 3
Train ML on sensitive data privatelyPATEPart 3
Measure causal ad impactSales Lift (in DCR)Part 3
Correct imperfect experiment groupsEntropy BalancingPart 3

The Core Principle

Every technique in this guide exists to answer one question:

How do we extract value from data while minimizing who can see what, when, and for what purpose?

The answer is never a single technology. It’s a layered defense:

  • Minimize what you collect
  • Anonymize what you store
  • Protect what you compute
  • Add noise to what you output
  • Audit what you access

No layer is perfect. Each has trade-offs — flexibility, accuracy, speed, cost. The art is choosing the right combination for your use case, understanding what each layer protects, and being honest about residual risks.

Privacy engineering isn’t about perfection. It’s about thoughtful layering — building systems that extract value while genuinely protecting the individuals behind the data.


Start Reading

Part 1: Data Protection Fundamentals →

Data Minimization, Storage Anonymization, Query Anonymization, Differential Privacy

Part 2: Secure Collaboration and Infrastructure →

Identity Mapping, Data Clean Rooms, Purpose Limitation, TEE/CVM, Encryption

Part 3: Privacy-Preserving Computation and Measurement →

Multi-Party Computation, PATE, Sales Lift, Entropy Balancing

Related Posts:

  • Privacy Enhancing Technologies (PETs) — Part 1
  • Privacy Enhancing Technologies (PETs) — Part 3
  • Privacy Enhancing Technologies (PETs) — Part 2
  • Teaching AI Models: Gradient Descent
  • Needle in the Haystack: Embedding Training and Context Rot
  • AI Paradigm Shift: From Rules to Patterns

Tags:

aiartificial-intelligencechatgptcybersecurityprivacy-enhancing-technologiesprivacy-preserving-technologiestechnology
Author

Archit Sharma

Follow Me
Other Articles
Previous

Privacy Enhancing Technologies (PETs) — Part 3

Next

ML Foundations – Linear Combinations to Logistic Regression

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Teaching AI Models: Gradient Descent
  • Needle in the Haystack: Embedding Training and Context Rot
  • Measuring Meaning: Cosine Similarity
  • AI Paradigm Shift: From Rules to Patterns
  • Seq2Seq Models: Basics behind LLMs
  • Word2Vec: Start of Dense Embeddings
  • Advertising in the Age of AI
  • Breaking the “Unbreakable” Encryption – Part 2
  • Breaking the “Unbreakable” Encryption – Part 1
  • ML Foundations – Linear Combinations to Logistic Regression
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme