Edinburgh Research Explorer Combining Private Set-Intersection with Secure Two-Party Computation

Private Set-Intersection (PSI) is one of the most popular and practically relevant secure two-party computation (2PC) tasks. Therefore, designing special-purpose PSI protocols (which are more eﬃcient than generic 2PC solutions) is a very active line of research. In particular, a recent line of work has proposed PSI protocols based on oblivious transfer (OT) which, thanks to recent advances in OT-extension techniques, is nowadays a very cheap cryptographic building block. Unfortunately, these protocols cannot be plugged into larger 2PC applications since in these protocols one party (by design) learns the output of the intersection. Therefore, it is not possible to perform secure post-processing of the output of the PSI protocol. In this paper we propose a novel and eﬃcient OT-based PSI protocol that produces an “encrypted” output that can therefore be later used as an input to other 2PC protocols. In particular, the protocol can be used in combination with all common approaches to 2PC including garbled circuits, secret sharing and homomorphic encryption. Thus, our protocol can be combined with the right 2PC techniques to achieve more eﬃcient protocols for computations of the form z = f ( X ∩ Y ) for arbitrary functions f .


Introduction
Private Set-Intersection (PSI) is one of the most practically relevant secure two-party computation (2PC) tasks.In PSI two parties hold two sets of strings X and Y , respectively.At the end of the protocol one (or both) party should learn the intersection of the two sets Z = X ∩ Y and nothing else about the input of the other party.
There are many real-world applications in which PSI is required.As an example, when mobile users install messaging apps, they need to discover whom among their contacts (from their address book) is also using the app, in order to be able to start communicating seamlessly with them.Doing so requires users to learn the intersection of their contact list with the list of registered users of the service which is stored at the server side.This is typically done by having users send their contact list to the server that can then compute the intersection and return the result to the user.Unfortunately this solution is very problematic not only for the privacy of the user, but for the privacy of the users' contacts as well!In particular, some of the people in the contact list might not want their phone number being transferred and potentially stored by the server, but they have no control over this. 1 Note that this is not just a theoretically interesting problem and that Signal (one of the most popular end-to-end encrypted messaging app) has recently recognized this as being a real problem and offered partial solutions to it. 2PSI has many other applications, including computing intersections of suspect lists, private matchmaking (comparing interests), testing human genome [BBC + 11], privacy-preserving ride-sharing [HOS17], botnet detection [NMH + 10], advertisment conversion rate [IKN + 17] and many more.
From a feasibility point of view, PSI is just a special case of 2PC and therefore any generic 2PC protocol (such as [Yao82,GMW87]) could be used to securely evaluate PSI instances as well.However, since PSI is a natural functionality that can be applied in numerous real-world applications, many efficient protocols for this specific functionality have been proposed, with early results dating back to the 80s [Sha80,Mea86].The problem was formally defined in [FNP04] and follow up work increased the efficiency of PSI protocols to have complexity only linear in the inputs of the parties [JL10, CT10].A very recent work shows how to obtain a protocol where communication complexity is linear in the size of the smaller set and logarithmic in the larger set [CLR17].
However, these protocols still require performing expensive public-key operations (e.g., exponentiations) for every element in the input sets.As public-key operations are orders of magnitudes more expensive than symmetric key operations, these protocols are not practically efficient for large input sets.In the meanwhile, generic techniques for 2PC had improved by several orders of magnitude and the question of whether special purpose protocols or generic protocols were most efficient has been debated in [HEK12,CT12].Due to its practical relevance, PSI protocols in the server-aided model have been proposed as well [KMRS14].Independent and concurrent works [PSWW18,FNO18] (which were not publicly available at time we first posted our paper on ePrint) consider the problem of using a PSI protocol to construct more complex functionality in an efficient way.More specifically, [PSWW18] provides a way to securely compute many variants of the set intersection functionality using a clever combination of Cuckoo hashing and garbled circuit.The work of Falk et al. [FNO18] focuses on obtaining a PSI protocol that is efficient in terms of communication.In addition, the authors of [FNO18] propose a PSI protocol where the output can be secret shared that has communication complexity of O(mλ log log m), where λ is the bit-length of the elements and m is the set-size.
The techniques used in our paper significantly differ from the techniques used in [PSWW18,FNO18].Our solution avoids the use of garbled circuits and rely on the security and the efficiency of OT and symmetric key encryption schemes.

OT-based PSI
The most efficient PSI protocols today are those following the approach of PSZ [PSSZ15,PSZ14].These protocols make extensive use of a cryptographic primitive known as oblivious transfer (OT).While OT provably requires expensive public-key operation, OT can be "extended" as shown by [IKNP03,ALSZ13,KK13] i.e., the few necessary expensive public-key operations can be amortized over a very large number of OT instances, and the marginal cost of OT is only a few (faster) symmetric key operations instead.In particular, improvements in OT-extension techniques directly imply improvements to PSI protocols as shown by e.g., [KKRT16,OOS17].
In a nutshell, the PSZ protocol introduced two important novel ideas to the state of the art of PSI.First, they give an efficient instantiation of the private set membership protocol (PSM) introduced in [FIPR05] based on OT.Second, they show how to efficiently implement PSI from PSM using hashing techniques.(An overview of their techniques is given below).

Our contribution
The main contribution of this paper is to give an efficient instantiation of PSM that provides output in encrypted format and can therefore be combined with further 2PC protocols.Our PSM protocol can be naturally combined with the hashing approach of PSZ to give a PSI protocol with encrypted output achieving the same favourable complexity in the input sizes.This enables the combination the efficiency of modern PSI techniques with the potentials of general 2PC.Combining our protocols with the right 2PC post-processing allows more efficient evaluation of functionalities of the form Z = f (X ∩ Y ) for any function f .Like in PSZ we only focus on semi-honest security.Using the protocol together with an actively secure OT-extension protocol such as [ALSZ15,KOS15] would result in a protocol with privacy but not correctness (i.e., the view of the protocol without the output can be efficiently simulated), which is a meaningful notion of security in some settings.PSI protocols with security against malicious adversaries have been proposed in e.g., [HL08,RR17a,RR17b].It is an interesting open problem to design efficient protocols which are both secure against active (or covert) adversaries and that produce encrypted output.Also, like in PSZ, we only focus on the two-party setting.The recent result of [HV17] has shown that multiparty set-intersection can be computed efficiently.Extending our result to the multiparty case is an interesting future research direction.
We also compare the computation complexity of our scheme for PSM with all the circuit-based PSI approaches (which can be combined with further postprocessing) proposed in [PSZ16].More precisely, in Table 1 we compare our protocol with the protocols of [PSZ16] in terms of number of symmetric key operations, and bits exchanged between the parties.The result of this comparison is that our protocol has better performance, in terms of computational complexity, than all the circuit-based PSI approaches considered for our comparison 3 .We refer the reader to App.A for more details about this comparison.

Improving the efficiency of smart contract protocols
Most of the cryptocurrency systems are built on top of blockchain technologies where miners run distributed consensus whose security is ensured as long as the adversary controls only a minority of the miners.Some cryptocurrency systems allow to run complex programs and decentralized applications on the blockchain.In Ethereum 4 those programs are called smart contracts.Roughly speaking, the aim of a smart contract is to run a protocol and start a transaction to pay a user of the cryptocurrency systems according to the output of the protocol execution.Unfortunately, this 3 The complexity of the protocols proposed in [PSZ16] depends upon parameters that are also related to the used hash function.In order to make our comparison fair, we have set these parameters as showed in the first column in Table 10 of [PSZ16].More precisely, the authors of [PSZ16] show in that table which parameters are adopted for their empirical efficiency comparison for the case where one set is much greater than the other set (which is exactly the case of PSM).

3
interesting feature of the smart contracts does not come for free.Indeed, in order to execute a smart contract, it is required to pay a gas fee that depends on the number of instructions of the protocol to be executed.So, higher is the complexity of protocol, higher is the price to pay.In this context a cryptographic protocol that outputs intermediate values in a secret shared way is particularly useful.Suppose that two parties want to securely compute f (X ∩ Y ) for arbitrary functions f , and reward another party depending on the output of this computation.Instead of writing on a smart contract the entire protocol to compute f (X ∩ Y ), the two parties could run a sub-protocol Π to obtain a secret share of χ = X ∩ Y without using a smart contract, and then run another sub-protocol Π ′ to compute f (χ), this time using a smart contract to enforce the reward policy.Following this approach it is possible to move part of the computation off-chain, thus increasing the performance and, at the same time, decreasing the costs required to execute the smart contract.Moreover, we observe that χ can be reused to compute different functions f ′ .The scenario described above is particularly interesting if one of the party can be fully malicious, but in this work we will focus on semi-honest security leaving the above as an open question.

Why PSZ and 2PC do not mix
We start with a quick overview of the PSM protocol in PSZ [PSSZ15,PSZ14], to explain why their protocol inherently reveals the intersection to one of the parties.From a high-level point of view, the protocol is conceptually similar to the PSM protocol from oblivious pseudorandom function (OPRF) of [FIPR05], except that the OPRF is replaced with a similar functionality efficiently implemented using OT.For simplicity, here we will use the OPRF abstraction.The goal of a PSM protocol is the following: the receiver R has input x, and the sender S has input a set Y ; at the end of the protocol the receiver learns whether x ∈ Y or not while the sender learns nothing.The protocol starts by using the OPRF subprotocol, so that R learns x * = F k (x) (where k is known to S), whereas S learns nothing about x.Now S evaluates the PRF on her own set and sends the set Y * = {y * = F k (y)|y ∈ Y } to R, who checks if x * ∈ Y * and concludes that x ∈ Y if this is the case.In other words, we map all inputs into pseudorandom strings and then let one of the parties test for membership "in the clear".Since the party performing the test doesn't have access to the mapping (e.g., the PRF key), this party can only check for the membership of x and no other points (i.e., all elements in Y * \ {x * } are indistinguishable from random in R's view).
From the above description, it should be clear that the PSZ PSM inherently reveals the output to one of the parties.Turning this into a protocol which provides encrypted output is a challenging task.Here is an attempt at a "strawman" solution: we change the protocol such that R still learns the pseudorandom string x * = F k (x) corresponding to x, but now S sends a value for every element in the universe.Namely, for each i (in the domain of Y ) S sends an encryption of whether i ∈ Y "masked" using F k (i) e.g., S sends , an encrypted version of whether x ∈ Y , which can be then used as input to the next protocol.
While this protocol produces the correct result, its complexity is exponential in the bit-length of |x|, which is clearly not acceptable.
Intuitively, we know that only a polynomial number of c i 's will contain encryptions of 1, and therefore we need to find a way to "compress" all the c i corresponding to i ∈ Y into a single one, to bring the complexity of the protocol back to O(|Y |).In the following, after defining some useful notation, we give an intuitive explanation on how to do that.

Our protocol
We introduce some useful (and informal) notation in order to make easier to understand the ideas behind our construction.We let Y = {y 1 , . . ., y M } be the input set of the sender S, and we assume w.l.o.g., that |Y | = M = 2 m . 6All strings have the same length e.g., |x| = |y i | = λ. 7We will use a special symbol ⊥ such that x = ⊥ ∀x.We use a function Prefix(x, i) that outputs the i most significant bits of x (Prefix(x, i) = Prefix(x, j) when i = j independently of the value of x) and for simplicity we define Prefix(Y, i) to be the set constructed by taking the i most significant bits of every element in Y .
The protocol uses a symmetric key encryption scheme Sym = (Gen, Enc, Dec) with the additional property that given a key k ← Gen(1 s ) it is possible to efficiently verify if a given ciphertext is in the range of k (see Sec. 3 for a formal definition).
Finally, the output of the protocol will be one of two strings γ 0 , γ 1 chosen by S, respectively denoting x ∈ Y and x ∈ Y .The exact format of the two strings depends on the protocol used for post-processing.For instance if the post-processing protocol is based on: 1) garbled circuits, then γ 0 , γ 1 will be the labels corresponding to some input wire; 2) homomorphic encryption, then γ b = Enc(pk, b) for some homomorphic encryption scheme Enc; 3) secret-sharing, then γ b = s 2 ⊕ b where s 2 is a uniformly random share chosen by S, so that if R defines its own share as s 1 = γ b then it holds that s 1 ⊕ s 2 = b. 8n order to "compress" the elements of Y we start by considering an undirected graph with a level structure of λ + 1 levels.The vertices in the last level of this graph will correspond to the elements of Y .More precisely, we associate the secret key k b λ b λ−1 ...b 1 of a symmetric key encryption scheme Sym to each element y = b λ b λ−1 . . .b 1 ∈ Y .The main idea is to allow the receiver to obliviously navigate this graph in order to get the key k or a special key k ⋆ otherwise.Moreover we allow the receiver to navigate the graph efficiently, that is, every level of the graph is visited only once.
Once a key k is obtained by the receiver, the sender sends O(|Y |) ciphertexts in a such a way that the key obtained by the receiver can decrypt only one ciphertext.Moreover the plaintext of this ciphertext will correspond to γ 0 or γ 1 depending on whether x ∈ Y or not.

First step: construct the graph G
Each graph level i ∈ {0, . . ., λ} has size at most |Prefix(Y, i)| + 1.More precisely, for every In addition, in the level i there is a special node, called sink node that contains a key k ⋆ i (which we refer to as sink key).The aim of k ⋆ i is to represent all the values that do not belong to Prefix(i, Y ).Let us now describe how the graph G is constructed.First, for i = 1, . . ., λ the key (for a symmetric key encryption scheme) k ⋆ i is generated using the generation algorithm Gen(•).As discussed earlier the aim of these keys is to represent the elements that do not belong to Y .More precisely, the sink key k ⋆ i , with i ∈ {1, . . ., λ} represents all the values that do not belong to Prefix(Y, i) and the key k ⋆ λ (the last sink key) will be used to encrypt the output γ 0 corresponding to non-membership in the last step of our protocol.Note that if Prefix(x, i) ∈ Prefix(Y, i) then Prefix(x, j) ∈ Prefix(Y, j) for all j > i.Therefore, once entered in a sink node, the sink path is never abandoned and thus the final sink key k ⋆ λ , will be retrieved (which allows recovery of γ 0 ).Let us now give a more formal idea of how G is constructed.
-The root of G is empty, and in the second level there are two vertices k 0 and k 1 where9 , for b = 0, 1 -For each vertex k t in the level i ∈ {1, . . ., λ} and for b = 0, 1 create the node k t||b as follows (if it does not exists) and connect k t to it.
We observe that a new node k t||b is generated only when t||b ∈ Prefix(Y, i).In the other cases the sink node k ⋆ i+1 is used.
In Fig. 1 we show an example of what the graph G looks like for the set Y = {010, 011, 110}.In this example it is possible to see how, in the 2nd level, all the elements that do not belong to Prefix(Y, 2) are represented by the sink node k ⋆ 2 .Using this technique we have that in the last level of G one node (k ⋆ 3 in this example) is sufficient to represent all the elements that do not belong to Y .Therefore, we have that the last level of G contains at most |Y | + 1 elements.We also observe that every level of G cannot contain more than |Y | + 1 nodes.

Second step: oblivious navigation of G
Let x = x λ x λ−1 . . .x 1 be the receiver's (R's) private input and Y be the sender's (S's) private input.
After S constructs the graph G we need a way to allow R to obtain k x λ x λ−1 ...x 1 if x ∈ Y and the sink key k ⋆ λ otherwise.All the computation has to be done in such a way that no other information about the set Y is leaked to the receiver, and as well that no information about x is leaked to the sender.In order to do so we use λ executions of 1-out-of-2 OT.The main idea is to allow the receiver to select which branch to explore in G depending on the bits of x.More precisely, in the first execution of OT, R will receive the key k x λ iff there exists an element in Y with the most significant bit equal to x λ , the sink key k ⋆ 1 otherwise.In the second execution of OT, R uses x λ−1 as input and S uses (c 0 , c 1 ) where c 0 is computed as follows: -For each key in the second level of G that has the form k t||0 , the key k t||0 is encrypted using the key k t .
-For every node v in the first level that is connected to a sink node k ⋆ 2 in the second level, compute an encryption of k ⋆ 2 using the key contained in v.
-Pad the input with random ciphertexts up to the upper bound for the size of this layer (more details about this step are provided later).
The procedure to compute the input c 1 is essentially the same (the only difference is that in this case we consider every key with form k t||1 and encrypt it using k t ).
Roughly speaking, in this step every key contained in a vertex u of the second level is encrypted using the keys contained in the vertex v of the previous level that is connected to u.For example, following the graph provided in Fig. 1, c Thus, after the second execution of OT R receives c x λ−1 that contains the ciphertexts described above where only one of these can be decrypted using the key k obtained in the first execution of OT.The obtained plaintext corresponds to the key k x λ x λ−1 if Prefix(x, 2) ∈ Prefix(Y, 2), to the sink key k ⋆ 2 otherwise.The same process is iterated for all the levels of G.More generally, if Prefix(x, j) ∈ Prefix(Y, j) then after the j−th execution of OT R can compute the key k x λ x λ−1 ...x λ−j using the key obtained in the previous phase.Conversely if Prefix(x, j) / ∈ Prefix(Y, j) then the sink key k ⋆ j is obtained by R. We observe that after every execution of OT R does not know which ciphertext can be decrypted using the key obtained in the previous phase, therefore he will try to decrypt all the ciphertext until the decryption procedure is successful.To avoid adding yet more indexes to the (already heavy) notation of our protocol we deal with this using a private-key encryption scheme with efficiently verifiable range.We note that this is not necessary and that when implementing the protocol one can instead use the point-and-permute technique [BMR90].This, and other optimisations and extensions of our protocol, are described in Section 5.

Third step: obtain the correct share
In this step S encrypts the output string γ 0 using the key k ⋆ λ and uses all the other keys in the last level of G to encrypt the output string γ 1 . 10At this point the receiver can only decrypt either the ciphertext that contains γ 0 if x / ∈ Y or one (and only one) of the ciphertexts that contain γ 1 if x ∈ Y .In the protocol that we have described so far R does not know which ciphertext can be decrypted using the key that he has obtained.Also in this case we can use a point-and-permute technique to allow R to identify the only ciphertext that can be decrypted using his key.
On the need for padding As describe earlier, we might need to add some padding to the OT sender's inputs.To see why we need this we make the following observation.We recall that in the i-th OT execution the sender computes an encryption of the keys in the level i of the artificial graph G using the keys of the previous level (i − 1). 11As a result of this computation the sender obtains the pair (c i 0 , c i 1 ), that will be used as input of the i-th OT execution, where c i 0 (as well as c i 1 ) contains a number of encryptions that depends upon the number of vertices on level (i − 1) of G.We observe that this leaks information about the structure of G to the receiver, and therefore leaks information about the elements that belong to Y .Considering the example in Fig. 1, if we allow the receiver to learn that the 2nd level only contains 3 nodes, then the receiver would learn that all the elements of Y have the two most significant bits equal to either t or t ′ for some t, t ′ ∈ {0, 1} 2 (in Fig. 1 for example we have t = 01 and t ′ = 11; note however that the receiver would not learn the actual values of t and t ′ ).
We note that the technique described in this section can be seen as a special (and simpler) example of securely evaluating a branching program.Secure evaluation of branching programs has previously been considered in [IP07,MN12]: unfortunately these protocols cannot be instantiated using OT-extension and therefore will not lead to practically efficient protocols (the security of these protocols is based on strong OT which, in a nutshell, requires the extra property that when executing several OTs in parallel, the receiver should not be able to correlate the answers with the queries beyond correlations which follow from the output).
Finally, we note that the work of Chor et al.
[CGN98] uses a data structure similar to the one described here to achieve private information retrieval (PIR) based on keywords.The main difference between keyword based PIR and PSM is that in PSM the receiver should not learn any other information about the data stored by the sender, so their techniques cannot be directly applied to our setting.

Definitions and tools
We denote the security parameter by s and use "||" as concatenation operator (i.e., if a and b are two strings then by a||b we denote the concatenation of a and b).For a finite set Q, x ← Q denotes a sampling of x from Q with uniform distribution.We use the abbreviation ppt that stands for probabilistic polynomial time.We use poly(•) to indicate a generic polynomial function.We assume the reader to be familiar with standard notions such as computational indistinguishability and the real world/ideal world security definition for secure two-party computation (see Appendix C for the actual definitions).

Special private-key encryption
In our construction we use a private-key encryption scheme with two additional properties.The first is that given the key k, it is possible to efficiently verify if a given ciphertext is in the range of k.With the second property we require that an encryption under one key will fall in the range of an encryption under another key with negligible probability As discussed in [LP09], it is easy to obtain a private-key encryption scheme with the properties that we require.According to [LP09, Definition 2] we give the following definition.Definition 1.Let Sym = (Gen, Enc, Dec) be a private-key encryption scheme and denote the range of a key in the scheme by Range s (k) = {Enc(k, x)} x∈{0,1} s .Then 1.We say that Sym has an efficiently verifiable range if there exists a ppt algorithm M such that M (1 s , k, c) = 1 if and only if c ∈ Range s (k).By convention, for every c / ∈ Range s (k), we have that Dec(k, c) = ⊥.
2. We say that Sym has an elusive range if for every probabilistic polynomial-time machine A, there exists a negligible function ν(•) such that Prob k←Gen(1 s ) [A(1 s ) ∈ Range s (k)] < ν(s).
Most of the the well known techniques used to construct a private-key encryption scheme (e.g. using a PRF) can be used to obtain a special private-key encryption scheme as well.The major difference is that a special encryption scheme has (in general) ciphertexts longer than a standard encryption scheme.

Our protocol Π ∈
In this section we provide the formal description of our protocol Π ∈ = (S, R) for the set-membership functionality where Where γ 0 and γ 1 are arbitrary strings and are part of the sender's input.Therefore our scheme protects both Y and γ 1−b , when γ b is received by R.
For the formal description of Π ∈ , we collapse the first and the second step showed in the information description of Section 2 into a single one.That is, instead of constructing the graph G, the sender only computes the keys at level i in order to feed the i-th OT execution with the correct inputs.The way in which the keys are computed is the same as the vertices for G are computed, we just do not need to physically construct G to allow S to efficiently compute the keys.In our construction we make use of the following tools.
1.A protocol Π OT = (S OT , R OT ) that securely (according to Definition 3) computes the following functionality 2. A symmetric key encryption scheme Sym = (Gen, Enc, Dec) with efficiently verifiable and elusive range.
3. In our construction we make use of the following function: This function computes the maximum number of vertices that can appear in the level i of the graph G.As discussed before, the structure of G leaks information about Y .In order to avoid this information leakage about Y , it is sufficient to add some padding to the OT sender's input so that the input size become |Y |.Indeed, as observed above, every level contains at most |Y | vertices.Actually, it is easy to see that min{|Y |, 2 i } represents a better upper bound on the number of vertices that the i-th level can contain.Therefore, in order to compute the size of the padding for the sender's input we use the function δ.

S computes
3. S and R execute Π OT , where S acts as the sender S OT using (c 1 0 , c 1 1 ) as input and R acts as the receiver R OT using x λ as input.When the execution of Π OT ends R obtains κ 1 := c 1 x λ .
2. S and R execute Π OT , where S acts as the sender S OT using (c i 0 , c i 1 ) as input and R acts as the receiver R OT using x λ−i+1 as input.When the execution of Π OT ends, R obtains c i x λ−i+1 .

Third stage
1. S executes the following steps.
2. R, upon receiving l acts as follows.
For every element t in the list c

2.2.
For all e ∈ l compute out ← Dec(κ λ , e) and output out if and only if out = ⊥.Theorem 1. Suppose Π OT securely computes the 1-out-of-2 OT functionality F OT and Sym is a symmetric key encryption scheme with efficiently verifiable range and elusive range, then Π ∈ securely computes the functionality F ∈ .
We refer the reader to App.B for the formal proof of this theorem.

Round complexity: parallelizability of our scheme
In the description of our protocol in Sec 4.1 we have the sender and the receiver engaging λ sequential OT executions.We now show that this is not necessary since the OT executions can be easily parallelized, given that each execution is independent from the other.That is, the output of a former OT execution is not used in a latter execution.For simplicity, we assume that Π OT consists of just two rounds, where the first round goes from the receiver to the sender, and the second goes in the opposite direction.We modify the description of the protocol of Sec 4.1 as follows.
-Step 3 of the first stage and step 2 of the second stage are moved to the beginning of the third stage.
-When S sends the last round of Π OT , he also performs the step 1 of the third stage.Therefore the list l is sent together with the last rounds of the λ Π OT executions.
Roughly speaking, in this new protocol S first computes all the inputs (k 0 , k 1 , c 1 0 , c 1 1 , . . ., c λ 0 , c λ 1 ) for the OTs.Then, upon receiving the λ first rounds of Π OT computed by R using as input the bits of x, S sends λ second round of Π OT together with the list l.We observe that in this case the S's inputs to the λ executions of Π OT can be pre-computed before any interaction with R begins.

Point and permute
In our protocol the receiver must decrypt every ciphertext at every layer to identify the correct one.This is suboptimal both because of the number of decryptions and because encryptions that have efficiently verifiable range necessarily have longer ciphertexts.This overhead can be removed using the standard point-and-permute technique [BMR90] which was introduced in the context of garbled circuits.Using this technique we can add to each key in each layer a pointer to the ciphertext in the next layer which can be decrypted using this key.This has no impact on security.

One-time pad
It is possible to reduce the communication complexity of our protocol by using one-time pad encryption in the last log s layers of the graph, in the setting where the output values γ 0 , γ 1 are such that |γ b | < s.For instance, if the output values are bits (in case we combine our PSM with a GMW-style protocol), then the keys (and therefore the ciphertexts) used in the last layer of the graph only need to be 1 bit long.Unfortunately, since the keys in the second to last layer are used to mask up to two keys in the last layer, the keys in the second to last layer must be of length 2 and so on, which is why this optimisation only gives benefits in the last log s layer of the graph.

PSM with secret shared input
Our PSM protocol produces an output which can be post-processed using other 2PC protocols.It is natural to ask whether it is possible to design efficient PSM protocols that also work on encrypted or secret-shared inputs.We note here that our protocol can also be used in the setting in which the input string x is bit-wise secret-shared between the sender and the receiver i.e., the receiver knows a share r and the sender knows a share s s.t., r ⊕ s = x.The protocol does not change for the receiver, who now inputs the bits of r = r λ , . . ., r 1 to the λ one-out-of-two OTs (instead of the bits of x as in the original protocol).The sender, at each layer i, will follow the protocol as described above if s i = 0 and instead swap the inputs to the OT if s i = 1.It can be easily verified that the protocol still produces the correct result and does not leak any extra information.

Keyword search
Our PSM protocol outputs an encryption of a bit indicating whether x ∈ Y or not.The protocol can be easily modified to output a value dependent on x itself and therefore implement "encrypted keyword search".That is, instead of having only two output strings γ 1 , γ 0 representing membership and non-membership respectively, we can have |Y | + 1 different output strings (one for each element y ∈ Y and one for non-membership).This can be used for instance in the context where Y is a database containing id's y and corresponding values v(y), and the output of the protocol should be an encryption of the value v(x) if x ∈ Y or a standard value v(⊥) if x ∈ Y .The modification is straightforward: instead of using all the keys in the last layer of the graph to encrypt the same value γ 1 , use each key k y to encrypt the corresponding value v(y) and the sink key (which is used to encrypt γ 0 in our protocol) to encrypt the value v(⊥).

PSI from PSM
We can follow the same approach of PSZ [PSSZ15, PSZ14] to turn our PSM protocol into a protocol for PSI.Given a receiver with input X and a sender with input Y the trivial way to construct PSI from PSM is to run |X| copies of PSM, where in each execution the receiver inputs a different x from X and where the sender always inputs her entire set Y .As described above, the complexity of our protocol (as the complexity of the PSM protocol of PSZ) is proportional in the size of |Y |, so this naïve approach leads to quadratic complexity O(|X| • |Y |).PSZ deals with this using hashing i.e., by letting the sender and the receiver locally preprocess their inputs X, Y before engaging in the PSM protocols.The different hashing techniques are explained and analysed in [PSZ16, Section 3].We present the intuitive idea and refer to their paper for details: in PSZ the receiver uses Cuckoo hashing to map X into a vector X ′ of size ℓ = O(|X|) such that all elements of X are present in X ′ and such that every x ′ i ∈ X ′ is either an element of X or a special ⊥ symbol.The sender instead maps her set Y into ℓ = |X ′ | small buckets Y ′ 1 , . . ., Y ′ ℓ such that every element y ∈ Y is mapped into the "right bucket" i.e., the hashing has the property that if y = x ′ i for some i then y will end up in bucket Y ′ i (and potentially in a few other buckets).Now PSZ uses the underlying PSM protocol to check whether x ′ i is a member of Y ′ i (for all i's), thus producing the desired result.The overall protocol complexity is now ) which (by careful choice of the hashing parameters) can be made sub-quadratic.In particular, if one is willing to accept a small (but not negligible) failure probability, the overall complexity becomes only linear in the input size.Since this technique is agnostic of the underlying PSM protocol, we can apply the same technique to our PSM protocol to achieve a PSI protocol that produces encrypted output.

Applications
The major advantage provided by Π ∈ is that the output of the receiver can be an arbitrary value chosen by the sender as a function of x for each value x ∈ Y ∪ {⊥}.This is in contrast with most of the approaches for set membership, where the value obtained by the receiver is a fixed value (e.g.0) when x ∈ Y , or some random value otherwise.We now provide two examples of how our protocol can be used to implement more complex secure set operations.The examples show some guiding principles that can be used to design other applications based on our protocol.
Without loss of generality in the following applications only the receiver will learn the output of the computation.Moreover we assume that the size of X and Y is equal to the same value M .15Also for simplicity we will describe our application using the naïve PSI from PSM construction with quadratic complexity, but using the PSZ approach, as described in Sec. 5, it is possible to achieve linear complexity using hashing techniques.Finally, in both our applications we exploit the fact that additions can be performed locally (and for free) using secret-sharing based 2PC.In applications in which round complexity is critical, the protocols can be redesigned using garbled circuits computing the same functionality, since the garbled circuit can be sent from the sender to the messages of the protocol.However in this case additions have to be performed inside the garbled circuit.

Computing statistics of the private intersection
Here we want to construct a protocol where sender and receiver have as input two sets, X and Y respectively, and want to compute some statistics on the intersections of their sets.For instance the receiver has a list of id's X and that the sender has a list of of id's Y and some corresponding values v(Y ) (thus we use the variant of our protocol for keyword search described in Section 5).At the end of the protocol the receiver should learn the average of v(X ∩ Y ) (and not |X ∩ Y |).
The main idea is the following: the sender and the receiver run M executions of our protocol where the receiver inputs a different x i from X in each execution.The sender always inputs the same set Y , and chooses the |Y |+1 outputs γ y i for all y ∈ Y ∪{⊥} for all i = 1, . . ., M in the following way: γ y i is going to contain two parts, namely an arithmetic secret sharing of the bit indicating whether x ∈ Y and an arithmetic secret sharing of the value v(y).The arithmetic secret sharing will be performed using a modulo N large enough such that N > M and N > M • V where V is some upper bound on v(y) so to be sure that no modular reduction will happen when performing the addition of the resulting shares.Concretely the sender sets After the protocol the receiver defines her shares u 1 i , v 1 i to be the shares contained in her output of the PSM protocol, and then both parties add their shares locally to obtain secret sharing of the size of the intersection and of the sum of the values i.e., U ) is a sharing of 0 and, if not, they compute and reveal the result of the computation V 1 +V 2 U 1 +U 2 .Both these operations can be performed using efficient two-party protocols for comparison and division such as the one in [T + 07, DNT12].

Threshold PSI
In this example we design a protocol Π t = (P t 1 , P t 2 ) that securely computes the functionality ) where That is, the sender and the receiver have on input two sets, S 1 and S 2 respectively, and the receiver should only learn the intersection between these two sets if the size of the intersection is greater or equal than a fixed (public) threshold value t.In the case that the size of the intersection is smaller that t, then no information about S 1 is leaked to P t 2 and no information about S 2 is leaked to P t 1 .(This notion was recently considered in [HOS17] in the context of privacy-preserving ride-sharing).As in the previous example, the sender and the receiver run M executions of our protocol where the receiver inputs a different x i from S 2 in each execution.The sender always inputs the same set S 1 , and chooses the two outputs γ 0 i , γ 1 i in the following way: γ b i is going to contain two parts, namely an arithmetic secret sharing of 1 if x i ∈ Y or 0 otherwise, as well as encryption of the same bit using a key k.The arithmetic secret sharing will be performed using a modulus larger than M , so that the arithmetic secret sharings can be added to compute a secret-sharing of the value |S 1 ∩ S 2 | with the guarantee that no overflow will occur.Then, the sender and the receiver engage in a secure-two party computation of a function that outputs the key k to the receiver if and only if |S 1 ∩ S 2 | > t.Therefore, if the intersection is larger than the threshold now the receiver can decrypt the ciphertext part of the γ and learn which elements belong to the intersection.The required 2PC is a simple comparison with a known value (the threshold is public) which can be efficiently performed using protocols such as [GSV07,LT13].

Acknowledgments
This research received funding from: COST Action IC1306; the Danish Independent Research Council under Grant-ID DFF-6108-00169 (FoCC); the European Union's Horizon 2020 research and innovation programme under grant agreements No 731583 (SODA) and No 780477 (PRIViLEDGE); "GNCS -INdAM".The work of 1st author has been done in part while visiting Aarhus University, Denmark.

A Complexity analysis
We focus our analysis of the protocol described in Sec 4.1 without taking into account the many possible optimisations showed in Sec. 5.In Π ∈ , sender and receiver run λ executions of a 1-out-of-2 OT; in addition, they perform some symmetric key operations.More precisely, in order to compute the inputs for the i-th OT executions, with i ∈ {2, . . ., λ}, S computes 2 • min{2 i−1 , |Y |} encryptions using the private-key encryption scheme Sym.We now observe that each encryption could contain a different key, and that this key needs to be generated by running Gen(•). 16This means that 4M represents an upper bound on the number of symmetric key operations performed by S to compute the input of one OT execution.Moreover, in the last interaction with R, S computes M encryptions.Therefore, an upper bound on the number symmetric key operations performed by S is (λ − 1) • 4M + M + 2 ≈ λ • 4M , where 2 represents the cost of running Gen(•) twice in order to compute the two keys required to feed the first OT execution17 .In every OT execution i, with i ∈ {2, . . ., λ}, R receives min{2 i−1 , |Y |} encryptions, and tries to decrypt all of them.Moreover, in the last interaction with S, R receives M encryptions and tries to decrypt all of them well.This means that the upper bound on the number of symmetric key operations made by R is (λ − 1) • M + M = λ • M .Following [PSZ16] we assume that 3 symmetric key operations are required for one OT execution.Therefore the total amount of symmetric key operations is λM 4+3λ for the sender and λM + 3λ for the receiver.In order to compare the efficiency of our protocol with the PSI protocols provided in [PSZ16] and to be consistent with their complexity analysis, we consider only the computation complexity for the party with the majority of the workload in the comparison.In Table 1 of Sec. 1 we have compared the computation (and the communication) complexity of our protocol with the circuit-based PSI approaches (which can be combined with further postprocessing) considered in [PSZ16].More precisely, we compare the sort-compare-shuffle (SCS) circuit of [HEK12] and the pairwise-comparison (PWC) circuit proposed in [PSZ16] with our approach for PSM.
As showed in Table 1, our protocol has better performance than all the circuit-based PSI approaches (which can be combined with further postprocessing) considered in [PSZ16].We note that, as described in Sec.4.4 of [PSZ16], the approach based on evaluating the OPRF inside circuit is faster than any other PSI protocols if one set is much smaller than the other (like in the case of PSM), but in this case the output will necessarily leak to the receiver, which prevents composition with further 2PC protocol.We refer the reader to Table 7 of [PSZ16] for a detailed efficiency comparison between different PSI protocols.Finally, we observe that the complexities analysis proposed in [PSZ16] is related to PSI protocols, while in this section we have only compared the efficiency of the PSM subprotocol.

A.1 Communication complexity
The communication complexity of our protocol is dominated by the communication complexity of the underlying OT protocol Π = (S OT , R OT ).Let sOT(D) be the amount of data exchanged between S OT and R OT when S OT uses an input of size D, and let sSYM(A) be the size of a ciphertext for the encryption scheme Sym when a plaintex of size A is used.Then the communication need to be known.Clearly those values can be efficiently computed since the randomness r and the input Y used to run S are known.
We now show more formally how S S works.Let S S OT be such that where c 0 , c 1 ∈ {0, 1} ⋆ , b ∈ {0, 1}, and s ∈ N. S S , on input Y and 1 s executes the following steps.
1. pick a r ← {0, 1} s and run S on input 1 s , Y using r as a randomness.
2. For every OT execution i, with i = 1, . . ., λ, run S S OT on input 1 s , c i 0 and c i 1 , where c i 0 and c i 1 are computed using the same procedure that S uses.
3. Continue the execution against S as R would do.
In to conclude this first part of the proof we just need to prove the following lemma.
where Y ∈ {{0, 1} ⋆ } ⋆ , x ∈ {0, 1} ⋆ , and s ∈ N. 19Proof.The proof goes through hybrid arguments starting from the real execution of Π ∈ .We gradually modify the execution until the input of R is not needed anymore in such a way that the final hybrid represents the simulator S S .We denote with OUT H i S (1 s ) the view of S in the hybrid experiment H i with i ∈ {0, . . ., λ}.The hybrid experiments that we consider are the following.
1. H 0 is identical to the real execution of Π ∈ .More precisely H 0 runs S using fresh randomness and interacts with him as R do on input x. 2. H i proceeds according to H 0 with the difference that in the first i OT executions S S OT is used.Since F ∈ is a deterministic function we have that • Define an empty list l.
• Permute the element inside l and send it to R.
Since F ∈ is deterministic we have that Moreover we observe that Therefore there are two things that remain to argue: We now start by showing that if the first statement does not hold for i = 1, then we can construct a adversary A S OT that breaks the security of Π OT against malicious receiver.Let C R OT be the challenger for the security game w.r.t. the security of Π OT against malicious receiver.The reduction works as follows.
2. A R OT then acts as a proxy between C R OT and R.
3. When the interaction between C S OT and R is over, A R OT continues the execution with R according to H 0 (H 1 ).
This part of the security proof ends with the observation that if C R OT has used the simulator S R OT then the joint distribution of the view of R and F for i = 2, . . ., λ follows the same arguments.
In order to prove that thus concluding the lemma's security proof, we need to consider the following intermediate hybrid experiment H ⋆ y with y ∈ {1, . . ., λ}.The description of the hybrid experiment follows.
4.1.Define the empty list c i and for each t ∈ Prefix(Y, i − 1) execute the following steps.5.For every t ∈ Prefix(Y, λ) compute and add Enc(k t , γ 1 ) to l.

We now prove that
for y = 2, . . ., λ.The proof proceeds by contradiction.Suppose that there exists some y ∈ {2, . . ., λ} such that {OUT R ), F ∈ (Y, γ 0 , γ 1 , x)} then we can construct ad adversary A Sym breaks the security of the encryption scheme Sym.Let C Sym the challenger for the security game w.r.t to Sym.Our adversary runs R with randomness r and executes the following steps.14.For every t ∈ Prefix(Y, λ) compute and add Enc(k t , γ 1 ) to l.
16. Permute the elements inside l and send l to R.
This part of the security proof ends with the observation that if C Sym has used m 0 then the joint distribution of the view of R and F ∈ (Y, γ 0 , γ 1 , x) corresponds to {OUT R H ⋆ y−1 (1 s ), F ∈ (Y, γ 0 , γ 1 , x)}, to {OUT R H ⋆ y (1 s ), F ∈ (Y, γ 0 , γ 1 , x)} otherwise.Since the following two distributions coincide {OUT R H ⋆ (1 s ), F ∈ (Y, γ 0 , γ 1 , x)}.The indistinguishability between the two distributions can be proved by using arguments similar to the one used lately.That is, by proceedings by contradiction and constructing adversary that breaks the security of the encryption scheme Sym.

C.1 Computational indistinguishability definition
Definition 2 (Computational indistinguishability).Let X = {X s } s∈N and Y = {Y s } s∈N be ensembles, where X s and Y s are probability distribution over {0, 1} l , for some l = poly(s).We say that X = {X s } s∈N and Y = {Y s } s∈N are computationally indistinguishable, denote X ≈ Y , if for every ppt distinguisher D there exists a negligible function ν such that for sufficiently large s ∈ N, Prob(t ← X s : D(1 s , t) = 1) − Prob(t ← Y s : D(1 s , t) = 1) < ν(s).
We note that in the usual case where |X s | = Ω(s) and s can be derived from a sample of X s , it is possible to omit the auxiliary input 1 s .In this paper we also use the definition of Statistical Indistinguishability.This definition is the same as Definition 2 with the only difference that the distinguisher D is unbounded.In this case we use X ≡ Y to denote that two ensembles are statistically indistinguishable.

C.2 Two party computation
A two-party protocol problem is cast by specifying a random process that maps pairs of inputs to pairs of outputs (one for each party).We refer to such a process as a functionality and denote it as F = (F 1 , F 2 ).That is, for every pair of inputs x, y ∈ {0, 1} s , the output-pair is a random variable (F 1 (x, y), F 2 (x, y)) ranging over pairs of strings.The first party (with input x) wishes to obtain F 1 (x, y) and the second party (with input y) wishes to obtain F 2 (x, y).We often denote such a functionality by (x, y) → (F 1 (x, y), F 2 (x, y)).
Let F = (F 1 , F 2 ) a probabilistic polynomial-time functionality and let Π = (P 1 , P 2 ) be a two-party protocol for computing F where P 1 and P 2 denote the two parties.The view of the party P i (i ∈ {1, 2}) during an execution of Π on (x, y) and security parameter s is denoted by view Π P i (x, y, 1 s ).output of the party P i (i ∈ {1, 2}) during an execution of Π on (x, y) and security parameter s is denoted by output Π P i (1 s , x, y) and can be computed from its own view of the execution.We denote the joint output of both parties by output Π (1 s , x, y) = (output Π P 1 (1 s , x, y), output Π P 2 (1 s , x, y)).

Figure 1 :
Figure 1: Example of how the graph G appears when the sender holds the set Y .

Table 1 :
Computation and communication complexity comparison for the PSM case.M represents the size of the set, s is the security parameter and λ is the bit-length of each element.
1.1.Define the empty list c i 0 and for all t ∈ Prefix(Y, i − 1) execute the following steps.If t||0 ∈ Prefix(Y, i) then compute k t||0 ← Gen(1 s ) and add Enc(k t , k t||0 ) to the list c i 0 .Otherwise, if t||0 / ∈ Prefix(Y, i) then compute and add Enc(k t , k ⋆ i ) to the list c i Define the empty 14 list c i 1 and for all t ∈ Prefix(Y, i − 1) execute the following step.If t||1 ∈ Prefix(Y, i) then compute k t||1 ← Gen(1 s ) and add Enc(k t , k t||1 ) to the list 2.1.Define the empty list c i .For j = 1, ..., min{2 i , |Y |}−1 compute and add Enc(Gen(1 s ), 0))to c i .2.2.Compute k i ← Gen(1 s ),and add Enc(k i−1 , k i ) to the list c i .2.3.Permute the elements inside c i .2.4.Run S R OT on input (1 s , x λ−i+1 , c i ).