Skip to content
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
icon icon Building AI Intuition

Connecting the dots...

icon icon Building AI Intuition

Connecting the dots...

  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
  • Home
  • ML Basics
  • Model Intuition
  • Encryption
  • Privacy Tech
  • Musings
  • About
Close

Search

Subscribe
Recent Posts
March 1, 2026
Teaching AI Models: Gradient Descent
March 1, 2026
Needle in the Haystack: Embedding Training and Context Rot
March 1, 2026
Measuring Meaning: Cosine Similarity
February 28, 2026
AI Paradigm Shift: From Rules to Patterns
February 16, 2026
Seq2Seq Models: Basics behind LLMs
February 16, 2026
Word2Vec: Start of Dense Embeddings
February 13, 2026
Advertising in the Age of AI
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 2
February 8, 2026
Breaking the “Unbreakable” Encryption – Part 1
February 8, 2026
ML Foundations – Linear Combinations to Logistic Regression
February 2, 2026
Privacy Enhancing Technologies – Introduction
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 3
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 2
February 2, 2026
Privacy Enhancing Technologies (PETs) — Part 1
February 2, 2026
An Intuitive Guide to CNNs and RNNs
February 2, 2026
Making Sense Of Embeddings
November 9, 2025
How CNNs Actually Work
August 17, 2025
How Smart Vector Search Works
Machine Learning Basics

Breaking the “Unbreakable” Encryption – Part 1

If you’ve spent any time in tech, you’ve heard of AES, RSA, and Diffie-Hellman. We treat them like digital…

Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 1

How Your Data Gets Protected Every time you browse a website, click an ad, or make a purchase, data flows through…

Machine Learning Basics

Making Sense Of Embeddings

Post 2/N When you search on Amazon for “running shoes,” the system doesn’t just look for those exact…

Machine Learning Basics

ML Foundations – Linear Combinations to Logistic Regression

Post 1a/N Every machine learning model — from simple house price predictors to neural networks with billions of…

Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 3

Privacy-Preserving Computation and Measurement In Part 1, we covered how organizations protect data internally —…

Machine Learning Basics

Seq2Seq Models: Basics behind LLMs

When you use Google Translate to turn a complex English sentence into Spanish, or when you ask Gemini to summarize a…

Home/Privacy Tech/Privacy Enhancing Technologies (PETs) — Part 2
Privacy Tech

Privacy Enhancing Technologies (PETs) — Part 2

By Archit Sharma
7 Min Read
0
Updated on February 28, 2026

Secure Collaboration Without Sharing Raw Data

In Part 1, we covered how individual organizations protect data internally — minimization, anonymization, query controls, and differential privacy.

But modern business often requires multiple parties to collaborate on data: advertisers measuring campaign effectiveness, retailers sharing purchase signals, healthcare providers conducting joint research.

The challenge: how do you extract insights from combined data without any party seeing the other’s raw information?

Part 2 covers the collaboration layer: the technologies that enable secure multi-party data work.


The Collaboration Problem

Consider this scenario: A retailer wants to know if an advertiser’s campaign drove in-store purchases.

  • The Retailer has purchase data.
  • The Advertiser has ad exposure data.
  • Neither wants to share their raw data with the other — it’s competitively sensitive and privacy-regulated.

They need a way to answer: “Did people who saw the ad buy more products?” without either party seeing the other’s user-level data. This is where Data Clean Rooms, identity mapping, purpose controls, and secure hardware come in.


1. Identity Mapping (Crosswalks, Private Set Intersection)

Connecting Users Across Systems: Before any collaboration can happen, you need to know that “User A” in System 1 is the same person as “User B” in System 2. This is the identity mapping problem.

Mental Model: The Diplomatic Translator Imagine two countries that speak different languages want to negotiate a treaty. Neither wants to learn the other’s language (that would give away intelligence).

Instead, they use a trusted translator who can convert messages between languages without either side understanding the other’s native tongue.

How Identity Mapping Works:

  • Each party has their own user identifiers (cookies, emails, device IDs).
  • An identity provider creates hashed mappings that connect these IDs.
  • Each party only sees their own mapping — not the full identity graph.
  • The connection happens through cryptographic matching, not raw data sharing.
Retailer’s ViewIdentity Provider (The “Crosswalk”)Advertiser’s View
customer_123Matchescookie_abc
customer_456Matchescookie_def
Sees Hash X↔Sees Hash X

Export to Sheets

Private Set Intersection (PSI) with commutative encryption is another method of identity mapping when a deterministic, verified user ID like email ID or mobile number is available. In this case, each party generates salted encryption and shares the result with the other party. Then each party generates a salted encryption again on their end on shared data and since they are using commutative encryption, they arrive at the same final output that can then be matched without revealing underlying data.

Key Point: Identity mapping usually happens before data enters a clean room. The crosswalk is established first, then both parties bring their pre-mapped data to the collaboration environment.

Where It’s Used Advertising measurement, retail media networks, cross-platform attribution, identity resolution services (LiveRamp, Experian, etc.).

Trade-offs

  • High Privacy Risk: If the full graph is centralized anywhere, it becomes a target.
  • Consent Management: Did the user agree to this linking?
  • Decay: Identity graphs decay over time as cookies expire and devices change.
  • Latency: Adds operational complexity and delay.

2. Data Clean Rooms (DCR)

The Secure Collaboration Space: A Data Clean Room (DCR) is a controlled environment where multiple parties can compute joint insights without sharing raw data.

Mental Model: The Sealed Arbitration Chamber Imagine two companies in a legal dispute. Neither wants to show their confidential documents to the other.

Instead, they submit documents to a sealed arbitration chamber. A neutral arbitrator inside the chamber can read both sets of documents and issue a ruling — but the documents never leave the chamber, and neither party sees the other’s submissions.

How a Data Clean Room Works:

  1. Each party uploads their data (with hashed IDs from the crosswalk).
  2. Only pre-approved aggregate queries are allowed.
  3. The clean room enforces minimum thresholds (no results for small cohorts).
  4. No row-level data export — only aggregates come out.
  5. Often combined with differential privacy for additional protection.

Plaintext

+------------------Data Clean Room------------------+
|                                                   |
|  Retailer uploads:     Advertiser uploads:        |
|  [purchases by         [ad exposures by           |
|   hashed user]          hashed user]              |
|                                                   |
|        ↓                      ↓                   |
|     +---------------------------+                 |
|     | Approved aggregate query: |                 |
|     | "Conversion rate for      |                 |
|     |  exposed vs. unexposed"   |                 |
|     +---------------------------+                 |
|                  ↓                                |
|         [Aggregate result only]                   |
|            "Lift: +12%"                           |
+---------------------------------------------------+
                   ↓
    Neither party sees the other's raw data





Where It’s Used Attribution measurement, sales lift studies, audience overlap analysis, measurement across walled gardens (Google, Meta, Amazon).

Trade-offs

  • Slow Iteration: Can’t explore data freely.
  • Operationally Heavy: Requires setup, governance, and approvals.
  • Expensive: Clean room services aren’t cheap.
  • Limited Flexibility: Only pre-approved queries work.
  • Governance Trust: You still must trust the clean room operator.

3. Purpose Limitation

Binding Data to Declared Intent: Even within a single organization, data collected for one purpose shouldn’t automatically be available for all purposes.

Mental Model: The Library with Restricted Sections Imagine a university library with different sections: general stacks (anyone can access), research archives (faculty only), and medical records (IRB-approved researchers only).

Your library card grants access based on your role and declared purpose. You can’t wander into medical records just because you have a card.

How Purpose Limitation Works:

  • Data is tagged with purpose at collection time (ads, safety, research).
  • Every query must declare its purpose.
  • Access is granted only if the query’s purpose matches the data’s allowed purposes.
  • Encryption gates enforce this at decryption time — data literally cannot be read without matching purpose.
Data Tagged ForQuery DeclaresResult
[ads, measurement]"ads"✅ Access Granted
[ads, measurement]"research"❌ Access Denied
[safety]"ads"❌ Access Denied
[safety]"safety"✅ Access Granted

Export to Sheets

Trade-offs

  • Honesty: Relies on honest declaration (someone could lie about their purpose).
  • Auditing: Needs strong auditing to catch misuse.
  • Taxonomy Drift: New use cases might not fit old categories.
  • Deterrence vs Prevention: Doesn’t stop malicious insiders instantly; it primarily detects and deters.

4. Trusted Execution Environments (TEE) & Confidential VMs

The Hardware Vault: All previous technologies assume you trust the system running the computation. But what if you don’t? What if even the cloud provider shouldn’t see the data?

Trusted Execution Environments (TEEs) provide hardware-level isolation — a secure enclave where even the operating system and cloud admin cannot see what’s happening inside.

Mental Model: The Sealed Black Box Imagine a voting machine designed so that even the election officials can’t tamper with it. Votes go in, tallies come out, but no one — not the manufacturer, not the officials, not hackers — can see or modify the individual votes inside. The machine is physically sealed and cryptographically verified.

How a TEE Works:

  1. Data enters the enclave encrypted.
  2. Inside the enclave, data is decrypted and processed.
  3. Only the approved code can run (verified by “attestation”).
  4. The operating system, hypervisor, and cloud admin cannot see inside.
  5. Only approved aggregate outputs leave the enclave.

Plaintext

+------------Cloud Server------------+
|                                    |
|   Operating System (untrusted)     |
|   +---------TEE Enclave---------+  |
|   | Encrypted memory            |  |
|   | Only attested code runs     |  |
|   | Data decrypted only inside  |  |
|   | Admin cannot inspect        |  |
|   +-----------------------------+  |
|                                    |
+------------------------------------+





Confidential Virtual Machines (CVMs): A CVM is simply a TEE applied to an entire cloud virtual machine.

  • TEE = The technology (e.g., Intel SGX, AMD SEV).
  • CVM = The cloud deployment of that technology.

Where It’s Used Healthcare and genomics (HIPAA data), financial services (trading algorithms), secure ML training, private attribution.

Trade-offs

  • Performance Overhead: Encryption/decryption costs time.
  • Complex Debugging: You can’t inspect what’s inside to fix bugs.
  • Cost: Cloud cost premium for confidential computing.
  • Code Trust: If the code inside the enclave is buggy or malicious, the TEE won’t save you.
  • Output Leakage: TEE protects processing, not results. You still need Differential Privacy on the output.

How These Technologies Work Together

A complete privacy-preserving collaboration might use all of these steps:

  1. Identity Mapping: Retailer and advertiser establish crosswalk via identity provider.
  2. Data Clean Room: Both parties upload data (keyed by pseudonyms) to a clean room.
  3. Confidential VM: The clean room runs inside a TEE/CVM for hardware protection.
  4. Purpose Limitation: Query declares “ads measurement”; system verifies permission.
  5. Query Execution: Approved query runs; Differential Privacy noise is added.
  6. Output: Only noisy aggregate leaves the room. Neither party sees raw data.

The Encryption Foundation: Diffie-Hellman and AES

You might wonder: how does data stay encrypted while moving between systems?

  • Diffie-Hellman Key Exchange: Allows two parties to generate the same secret key over a public network without ever transmitting the key itself. (Like mixing paint colors to match a secret shade without showing the original colors).
  • AES Symmetric Encryption: Once the shared key is established, AES uses it to encrypt the actual data. It is the standard for secure communication (HTTPS, databases, enclaves).

Choosing the Right Tools

ScenarioTechnologies to Consider
Two companies measuring ad effectivenessIdentity mapping + Data Clean Room + DP
Healthcare joint researchClean Room inside CVM + Strict Purpose Controls
Internal analytics on sensitive dataPurpose Limitation + Query Anonymization + DP
ML training on user dataDP-SGD + TEE for model training
Cross-platform identity resolutionDecentralized crosswalks + No centralized graph

Export to Sheets


Common Misconceptions

  • ❌ “Clean rooms mean no one sees the data.”
    • The clean room operator (or the code running inside) does see the data during computation. TEEs reduce this trust requirement, but you still trust the code. True “no one sees” requires advanced cryptography like Fully Homomorphic Encryption (FHE), which is currently too slow for most uses.
  • ❌ “TEEs solve all privacy problems.”
    • TEEs protect data during processing, but outputs can still leak information. A TEE that returns exact user counts is not private. You need DP or query anonymization on top.
  • ❌ “Identity mapping is anonymous.”
    • Hashed IDs are pseudonymous, not anonymous. If an attacker obtains the hash function and your email, they can compute your hash and re-identify you.

Final Thought

Privacy-preserving collaboration is a layered system.

  1. Identity Mapping: Connect users across systems without sharing raw IDs.
  2. Data Clean Rooms: Compute joint insights without exposing raw data.
  3. Purpose Limitation: Bind data usage to declared intent.
  4. TEEs and CVMs: Protect data even from the infrastructure operator.

The art of privacy engineering is layering these appropriately — understanding what each tool protects, what it doesn’t, and where the residual risks remain. No single technology is a silver bullet. But combined thoughtfully, they enable collaboration that would otherwise be impossible — extracting value from data while genuinely protecting the individuals behind it.

Related Posts:

  • Privacy Enhancing Technologies - Introduction
  • Privacy Enhancing Technologies (PETs) — Part 3
  • Privacy Enhancing Technologies (PETs) — Part 1
  • How CNNs Actually Work
  • Needle in the Haystack: Embedding Training and Context Rot
  • AI Paradigm Shift: From Rules to Patterns

Tags:

aiartificial-intelligenceconfidential-virtual-machinescrosswalkcvmcybersecuritydata-clean-roomsdcrEncryptionpurpose-limitationsecuritytechnologyteetrusted-execution-environment
Author

Archit Sharma

Follow Me
Other Articles
Previous

Privacy Enhancing Technologies (PETs) — Part 1

Next

Privacy Enhancing Technologies (PETs) — Part 3

No Comment! Be the first one.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

icons8 pencil 100
ML Basics

Back to the basics

screenshot 1
Model Intuition

Build model intuition

icons8 lock 100 (1)
Encryption

How encryption works

icons8 gears 100
Privacy Tech

What protects privacy

screenshot 4
Musings

Writing is thinking

Recent Posts

  • Teaching AI Models: Gradient Descent
  • Needle in the Haystack: Embedding Training and Context Rot
  • Measuring Meaning: Cosine Similarity
  • AI Paradigm Shift: From Rules to Patterns
  • Seq2Seq Models: Basics behind LLMs
  • Word2Vec: Start of Dense Embeddings
  • Advertising in the Age of AI
  • Breaking the “Unbreakable” Encryption – Part 2
  • Breaking the “Unbreakable” Encryption – Part 1
  • ML Foundations – Linear Combinations to Logistic Regression
Copyright 2026 — Building AI Intuition. All rights reserved. Blogsy WordPress Theme