Oblivious Transfer
What is Oblivious Transfer (OT)? A deep dive into the fundamental cryptography protocol for private choice. Learn how OT Extension powers modern privacy apps.
Oblivious Transfer is a lesser known but a fundamental building block that powers some of the most advanced privacy technologies. At its core, Oblivious Transfer (OT) is a two-party protocol that solves a simple problem of "private choice." It allows a Receiver to choose one out of a Sender's $N$ messages and receive it, with the absolute guarantee that the Sender learns nothing about the choice, and the Receiver learns nothing about the other $N-1$ messages.

1-out-of-2 OT
The most common and illustrative variant is the 1-out-of-2 Oblivious Transfer which is shown in the diagram above. Here, the Sender (Alice) has two messages, $m_0$ and $m_1$. The Receiver (Bob) has a choice bit, $c$, and wants to learn message $m_c$. While OT can be constructed from various public-key cryptosystems, the classic implementation is built upon the well-known RSA algorithm, whose security relies on the "trapdoor" property: it's easy to perform the public-key operation (encryption) but computationally impossible to reverse it (decrypt) without the secret private key.
An RSA-Based Implementation
The following protocol flow demonstrates how the mathematical properties of RSA can be leveraged to achieve the goals of OT.
Setup:
- Alice (Sender): Has two messages, $m_0$ and $m_1$, and the RSA public/private key pair $(N, e)$ and $d$.
- Bob (Receiver): Has a choice bit $b \in \{0, 1\}$.
Protocol Steps:
-
Alice's Initial Message: Alice generates two large, random numbers, $x_0$ and $x_1$. She sends both of these, along with her public key $(N, e)$, to Bob.
-
Bob's Blinding and Commitment:
- Bob generates his own random number, $k$ (often called a "blinding factor").
- He receives $x_0$ and $x_1$ from Alice. He selects the one corresponding to his choice bit, $x_b$.
- He "blinds" Alice's $x_b$ by encrypting a value derived from it and the blinding factor $k$. Specifically, he computes:
$$v = (x_b + k^e) \pmod N$$ - Bob sends this single value, $v$, back to Alice. Because $k$ is random and unknown to Alice, she cannot tell if Bob started with $x_0$ or $x_1$. This preserves Bob's privacy.
-
Alice's Response:
- Alice receives $v$. She uses her private key, $d$, to "unblind" this value relative to both of her original random numbers:
$$k_0 = (v - x_0)^d \pmod N$$ $$k_1 = (v - x_1)^d \pmod N$$ - Here's the trick: one of these computations will reveal Bob's blinding factor $k$. The other will just be a meaningless number. For example, if Bob chose $b=0$, then $k_0=((x_0+k^e)-x_0)^d=(k^e)^d=k$.
- Alice now masks her secret messages with these two computed values:
$$m'_0 = m_0 + k_0$$ $$m'_1 = m_1 + k_1$$ - She sends both masked messages, $m'_0$ and $m'_1$, back to Bob.
- Alice receives $v$. She uses her private key, $d$, to "unblind" this value relative to both of her original random numbers:
-
Bob's Decryption:
- Bob receives $m'_0$ and $m'_1$.
- He knows his original blinding factor $k$. He simply subtracts $k$ from the message corresponding to his choice, $m'_b$:
$$m_b = m'_b - k$$ - For the message he didn't choose (say, $m'_{1-b}$), the value $k_{1-b}$ that Alice used is unknown to him, so subtracting his own key $k$ will just result in garbage. The derivation below shows how the correct message is recovered for a choice of $b=1$:
From 1-out-of-2 to 1-out-of-N OT
Here, Alice has $N$ messages, and Bob wants to retrieve the $c$-th message, $m_c$.
-
Direct Generalization (Inefficient): We can directly extend the Alice-initiated RSA protocol. While conceptually simple, this method highlights the performance challenges with large datasets.
Protocol Flow:
- Alice's Initial Message: Alice generates $N$ large, random numbers, $(x_0, x_1, \dots, x_{N-1})$. She sends all $N$ of them to Bob.
- Bob's Blinding and Commitment: Bob receives the list of $N$ random numbers. He has his choice index $c$. He generates a single random blinding factor, $k$. He selects his chosen random number, $x_c$, from the list and computes the blinded value $v$:
$$v = (x_c + k^e) \pmod N$$
He sends this single value $v$ back to Alice. Since $k$ is secret, Alice cannot determine which of the $N$ original $x_i$ values was used. - Alice's Encrypted Response: Alice receives $v$. She computes a potential key for every possibility by using her private key. For each index $i$ from $0$ to $N-1$, she computes:
$$k_i = (v - x_i)^d \pmod N$$
Only one of these, $k_c$, will equal Bob's secret blinding factor $k$. All others will be meaningless to Bob. She then masks each of her $N$ messages with the corresponding potential key:
$$m'_i = m_i + k_i$$
She sends all $N$ masked messages $(m'_0, m'_1, \dots, m'_{N-1})$ to Bob. - Bob's Final Decryption: Bob receives the list of $N$ masked messages. He knows his choice $c$ and his secret random number $k$. He unmasks the one message he wants:
$$m_c = m'_c - k$$
This approach is secure, but it requires Alice to perform $N$ expensive public-key decryptions, making its computational cost linear, $O(N)$. For any significant value of $N$, this becomes prohibitively slow.
-
Reduction to 1-out-of-2 OT (Efficient): A far better and more practical method is to avoid the linear cost by building the 1-out-of-N protocol from just $L = \lceil \log_2 N \rceil$ instances of a 1-out-of-2 OT protocol, such as the one we just detailed. This reduction treats the 1-out-of-2 OT as a fundamental "black box" building block.
- Bob represents his choice $c$ as an $L$-bit binary number. For example, if $N=8$ and Bob wants the 6th item ($m_5$), his choice is $c=5$, which is
101
in binary. - Alice creates $L$ pairs of random secret keys: $(K_{0,0}, K_{0,1}), (K_{1,0}, K_{1,1}), \dots$.
- For each of the $L$ bits in Bob's choice, they perform one complete 1-out-of-2 OT. In our example, Bob would perform 3 OTs, choosing key $K_{0,1}$ in the first, $K_{1,0}$ in the second, and $K_{2,1}$ in the third. He now holds ${K_{0,1}, K_{1,0}, K_{2,1}}$.
- Alice encrypts each of her $N$ messages by combining the keys corresponding to the message's index bits (e.g., via XOR, $\oplus$). For $m_5$ (index
101
), the encryption key would be $K_{0,1} \oplus K_{1,0} \oplus K_{2,1}$. - Bob can form the final decryption key for his chosen message, $m_c$, because he has precisely the right set of keys. He cannot form the key for any other message. To decrypt $m_3$ (index
011
), for instance, he would need ${K_{0,1}, K_{1,1}, K_{2,0}}$, but he is missing two of these keys. This reduces the number of expensive public-key operations to $O(\log N)$, making it exponentially faster than the direct generalization.
- Bob represents his choice $c$ as an $L$-bit binary number. For example, if $N=8$ and Bob wants the 6th item ($m_5$), his choice is $c=5$, which is
Formal Description: Efficient 1-out-of-N OT
This protocol allows a Receiver (Bob) to learn the $c$-th message from a Sender's (Alice) database of $N$ messages, without Alice learning $c$, and without Bob learning anything about the other $N-1$ messages.
Parties, Inputs, and Primitives:
- Sender (Alice): Holds an ordered list of $N$ messages, $M = (m_0, m_1, \ldots, m_{N-1})$. Assume each message is a string of length $\ell$.
- Receiver (Bob): Holds a choice index $c$, where $0 \leq c < N$.
- Security Parameter ($\kappa$): A parameter that defines the bit-length of cryptographic keys (e.g., $\kappa=128$).
- Cryptographic Hash Function ($H$): A function $H: \{0,1\}^* \to \{0,1\}^\ell$ that maps arbitrary-length strings to $\ell$-bit strings, modeled as a Random Oracle.
- Subroutine ($\text{OT}^{1}_{2}$): A secure 1-out-of-2 Oblivious Transfer protocol. We treat this as a black box that can be called upon by the parties.
Protocol Steps
The protocol proceeds in three main phases: Setup, Key Transfer, and Message Transfer.
Phase 1: Setup
- Parameter Calculation: Let $L = \lceil \log_2 N \rceil$. This is the number of bits required to represent any index from $0$ to $N-1$.
- Bob's Input Encoding: Bob represents his choice index $c$ as an $L$-bit binary string:
$$c = (b_{L-1} b_{L-2} \ldots b_0)$$
where $b_j \in {0, 1}$ is the $j$-th bit of the binary representation of $c$.
Phase 2: Base OT for Key Transfer
-
Alice's Key Generation: For each bit position $j \in \{0, \ldots, L-1\}$, Alice generates a pair of random secret keys, each of length $\kappa$:
$$(k_{j,0}, k_{j,1})$$
This results in a total of $2L$ secret keys held by Alice. -
Parallel OT Execution: Alice and Bob execute the $\text{OT}^{1}_{2}$ protocol $L$ times in parallel (or sequentially). For each $j \in {0, \ldots, L-1}$:
- Alice acts as the sender in the $\text{OT}^{1}_{2}$ subroutine, with her input being the pair of keys $(k_{j,0}, k_{j,1})$.
- Bob acts as the receiver in the $\text{OT}^{1}_{2}$ subroutine, with his input being his choice bit for that position, $b_j$.
- Result: After the $j$-th execution, Bob learns the key $k_{j,b_j}$ and nothing about $k_{j, 1-b_j}$.
-
Outcome of Phase 2: At the conclusion of this phase, Bob holds a set of $L$ keys, one for each bit position, corresponding to his choice index $c$:
$$\{k_{0,b_0}, k_{1,b_1}, \ldots, k_{L-1,b_{L-1}}\}$$
Alice has no information about which of the $2L$ keys Bob has obtained.
Phase 3: Message Encryption and Transfer
-
Alice's Message Encryption: For each message $m_i$ in her database (where $i \in \{0, \ldots, N-1\}$), Alice performs the following steps:
a. Represent the index $i$ as an $L$-bit binary string: $i = (i_{L-1} i_{L-2} \ldots i_0)$.
b. Construct the final encryption key $K_i$ for message $m_i$ by applying the hash function $H$ to the concatenation of the secret keys corresponding to the bits of $i$:
$$K_i = H(k_{0,i_0} ,|, k_{1,i_1} ,|, \ldots ,|, k_{L-1,i_{L-1}})$$
c. Encrypt the message $m_i$ using $K_i$ as a one-time pad (XOR encryption):
$$E_i = m_i \oplus K_i$$ -
Alice's Message Transmission: Alice sends the complete set of $N$ encrypted ciphertexts, $(E_0, E_1, \ldots, E_{N-1})$, to Bob.
Phase 4: Decryption
-
Bob's Key Reconstruction: Bob has received the list of all $N$ ciphertexts but only needs to decrypt $E_c$. He reconstructs the necessary decryption key, $K_c$, using the $L$ keys he obtained in Phase 2:
$$K_c = H(k_{0,b_0} ,|, k_{1,b_1} ,|, \ldots ,|, k_{L-1,b_{L-1}})$$ -
Bob's Decryption: Bob decrypts his chosen ciphertext, $E_c$, by applying the XOR operation with his reconstructed key:
$$m_c = E_c \oplus K_c$$
Bob successfully recovers $m_c$. For any other message $m_i$ where $i \neq c$, the binary representation of $i$ must differ from $c$ in at least one bit position, say $j$. To compute the key $K_i$, Bob would need the key $k_{j,i_j}$, but he only holds $k_{j,b_j}$. Since Bob is missing at least one of the component keys needed to construct $K_i$, he cannot learn anything about any message $m_i$ for $i \neq c$.
Formal Description: k-out-of-N OT
This protocol allows a Receiver (Bob) to learn a specific subset of $k$ messages from a Sender's (Alice) database of $N$ messages. Alice does not learn which $k$ messages Bob chose, and Bob learns nothing about the $N-k$ messages he did not choose.
Parties, Inputs, and Primitives:
- Sender (Alice): Holds an ordered list of $N$ messages, $M = (m_0, m_1, \ldots, m_{N-1})$.
- Receiver (Bob): Holds a choice set $C = {c_1, c_2, \ldots, c_k}$, which is a subset of ${0, 1, \ldots, N-1}$ of size $k$.
- Security Parameter ($\kappa$): A parameter that defines the bit-length of cryptographic keys (e.g., $\kappa=128$).
- Symmetric Encryption Scheme ($(E, D)$): A pair of secure encryption and decryption algorithms (e.g., AES).
- Subroutine ($\text{OT}^{1}_{2}$): A secure 1-out-of-2 Oblivious Transfer protocol, treated as a black box.
Protocol Steps
The protocol proceeds in three distinct phases. The core idea is to first reduce the problem of transferring large messages to the problem of securely transferring small secret keys.
Phase 1: Message Encryption and Transfer
- Alice's Key Generation: Alice generates $N$ random symmetric keys, one for each of her messages: $\{key_0, key_1, \ldots, key_{N-1}\}$.
- Alice's Encryption: For each message $m_i$ in her database, Alice computes its corresponding ciphertext using the encryption algorithm $E$:
$$E_i = E(key_i, m_i)$$ - Alice's Transmission: Alice sends the complete set of $N$ ciphertexts, $(E_0, E_1, \ldots, E_{N-1})$, to Bob. Bob cannot decrypt any of these yet as he does not possess any of the keys.
Phase 2: Base OT for Key Transfer
This phase securely transfers the $k$ keys that Bob needs to decrypt his chosen ciphertexts.
- Parallel OT Execution: Alice and Bob execute the $\text{OT}^{1}_{2}$ protocol $N$ times in parallel (or sequentially). For each index $i \in \{0, \ldots, N-1\}$:
a. Bob's Choice Bit Generation: Bob determines his choice bit, $b_i$, for the $i$-th OT based on whether the index $i$ is in his choice set $C$:
b. Alice's OT Inputs: Alice generates a random dummy key, $d_i$, of length $\kappa$. Her input pair for the $i$-th OT is the dummy key and the real key: $(d_i, key_i)$.
c. OT Subroutine:
* Alice acts as the sender in the $\text{OT}^{1}_{2}$ subroutine with her input pair $(d_i, key_i)$.
* Bob acts as the receiver in the $\text{OT}^{1}_{2}$ subroutine with his choice bit $b_i$.
d. Result: After the $i$-th execution, Bob receives either the real key $key_i$ (if his choice bit was 1) or the useless dummy key $d_i$ (if his choice bit was 0).
- Outcome of Phase 2: At the conclusion of this phase, Bob holds a set of $N$ keys. This set contains the $k$ real keys corresponding to his choice set $C$, and $N-k$ useless dummy keys for all other indices.
Phase 3: Decryption
- Bob's Decryption: For each index $c_j$ in his original choice set $C$, Bob performs the following:
a. He retrieves the key he received for index $c_j$ during Phase 2 (which is the correct key, $key_{c_j}$).
b. He retrieves the corresponding ciphertext $E_{c_j}$ from the set he received in Phase 1.
c. He uses the decryption algorithm $D$ to recover the original message:
$$m_{c_j} = D(key_{c_j}, E_{c_j})$$
Bob successfully recovers all $k$ messages he chose. For any other message $m_i$ where $i \notin C$, he only holds the dummy key $d_i$. Attempting to decrypt $E_i$ with $d_i$ (i.e., computing $D(d_i, E_i)$) will produce only random-looking garbage, thus preserving Alice's privacy for the messages he did not choose.
OT Extension
The core motivation for OT Extension is to overcome the severe performance bottleneck of public-key cryptography. While secure, protocols built on systems like RSA are thousands of times slower than those using symmetric-key primitives like AES or hashing. For modern applications needing millions of OTs, relying on public-key "base OTs" is computationally infeasible.
OT Extension solves this by introducing a powerful idea: amortization through cryptographic seeding. The goal is to perform a very small, fixed number of expensive base OTs to establish a "seed" of correlated secret information, and then use fast, symmetric-key operations to "extend" this seed into an arbitrarily large number of new OTs.
The theory can be broken down into three conceptual phases.
1. Phase I: Establishing Correlated Randomness (The Seed)
The first phase uses a small number, $\kappa$ (e.g., 128), of base OTs to create a specific kind of shared secret between Alice and Bob. This is the only slow part of the protocol.
The outcome of this phase is not the transfer of actual messages, but the creation of correlated secrets. At the end of this phase:
- Alice holds a single, secret, random $\kappa$-bit vector, $T = (t_0, t_1, \ldots, t_{\kappa-1})$.
- Bob holds two sets of $\kappa$ secret random vectors, let's call them $S_0 = \{s_{0,0}, s_{0,1}, \ldots, s_{0,\kappa-1}\}$ and $S_1 = \{s_{1,0}, s_{1,1}, \ldots, s_{1,\kappa-1}\}$.
The crucial correlation established by the base OTs is that Alice's secret vector $T$ is identical to one of Bob's secret vectors from each pair, based on her choice bits. That is, $T$ is equivalent to the vector $(s_{t_0,0}, s_{t_1,1}, \ldots, s_{t_{\kappa-1},\kappa-1})$. Alice knows her choice vector $T$, but she does not know the other half of Bob's secrets (the $S_{1-t_j, j}$ vectors). Bob knows all his $S$ vectors, but he has no idea which ones Alice chose (i.e., he doesn't know $T$).
This small set of correlated random strings is the "cryptographic seed."
2. Phase II: Extending the Correlation (The Growth)
The second phase is the "magic" of OT Extension. It takes the short, $\kappa$-bit correlation from the seed and stretches it into a massive correlation involving $N$ secrets, where $N$ can be millions or billions. This is done entirely with cheap, symmetric-key operations (specifically, cryptographic hash functions that behave like random oracles).
Conceptually, you can think of this phase as a Pseudorandom Correlation Generator. Just as a Pseudorandom Number Generator (PRG) takes a short random seed and stretches it into a long, random-looking string, this process takes the short correlated seed and stretches it into two long, correlated sets of one-time pads.
At the end of this phase:
- For each of the $N$ new OTs, Alice is able to compute a pair of one-time pads: $(p_{i,0}, p_{i,1})$.
- For each of the $N$ new OTs, Bob, based on his choice bit $c_i$, is able to compute only one of those pads: $p_{i, c_i}$.
The security of the hash function guarantees that Bob cannot learn anything about the other pad, $p_{i, 1-c_i}$, and Alice learns nothing about Bob's choice $c_i$. The initial correlation from the base OTs has been successfully "extended" to this new, much larger set of correlated pads.
3. Phase III: Transferring the Messages (The Harvest)
This final phase is now trivially easy and extremely fast.
-
Alice takes her two messages for the $i$-th OT, $(m_{i,0}, m_{i,1})$, and masks them with the two pads she just generated:
$$E_{i,0} = m_{i,0} \oplus p_{i,0}$$ $$E_{i,1} = m_{i,1} \oplus p_{i,1}$$
She sends both encrypted messages to Bob. -
Bob takes the single pad he was able to generate, $p_{i, c_i}$, and uses it to unmask the corresponding ciphertext:
$$m_{i, c_i} = E_{i, c_i} \oplus p_{i, c_i}$$
This completes one of the $N$ oblivious transfers. Since all operations in Phase II and III are symmetric, the high cost of the initial $\kappa$ base OTs is successfully amortized, making the average cost per OT extremely low and making large-scale secure computation practical.
Conclusion
Our exploration of Oblivious Transfer has taken us on a journey from a simple, elegant idea to a powerful, scalable tool that underpins much of modern privacy-preserving technology. We began with the foundational 1-out-of-2 OT, a protocol that seems almost like a magic trick in its ability to guarantee a private choice. We saw how its principles could be realized through the trapdoor functions of public-key cryptography and how it could be generalized to handle more complex queries, from 1-out-of-$N$ to $k$-out-of-$N$ selections.
Yet, as we discovered, the true story of Oblivious Transfer is one of transformation. For decades, OT and the secure computation protocols it enabled were largely confined to the realm of theory, their reliance on slow public-key operations rendering them impractical for real-world problems. The advent of OT Extension was the watershed moment that changed everything. By providing a method to amortize the cost of expensive base OTs, it unlocked the potential of the primitive, turning it from a theoretical curiosity into a high-performance workhorse capable of executing millions of private choices per second.
Today, OT is more relevant than ever. In an era defined by big data and artificial intelligence, it offers a concrete answer to one of the most pressing questions of our time: how can we collaborate and benefit from data without sacrificing our privacy? Oblivious Transfer provides a crucial mechanism to break the false dichotomy between functionality and confidentiality. From enabling private medical research and secure machine learning to protecting our passwords and personal contacts, OT is the indispensable engine driving a new generation of technologies built on a foundation of cryptographic trust. It stands as a testament to the power of a simple idea to reshape our digital world for the better.
What's Next?
We've seen that Oblivious Transfer is a powerful building block, but what do we actually build with it? While we touched on several applications, one of the most direct and widely deployed is Private Set Intersection. We will talk about it in the following post.