EXO Labs:

So we face a dilemma: how can we maintain the privacy benefits of local models while still accessing real-time information?

[…]

The problem is that this database is hosted on a server somewhere in the cloud. The host of the server can see our query and the response of that query, compromising our privacy.

A potential solution would be to download all the data locally. That way, we could query the database locally keeping the query and response private. But this is impractical – Twitter alone generates 200TB of text per year. Your phone can’t store that much data.

What we want is a way for the server to process our query without ever seeing what we were asking for or what results we got back. Let’s see if we can find a way to do that by looking at how search works.

How modern search engines work:

Modern search engines convert both documents and queries into normalized vectors such that similar items end up close together. This conversion is called “embedding”. For instance:


  "I love cats"      → A = [0.7, 0.71] (normalized)
  "I like felines"   → B = [0.5, 0.866] (normalized)
  "Buy cheap stocks" → C = [-0.8, 0.6] (normalized)

To find relevant documents, we look for vectors that point in similar directions – meaning they have a small angle between them. The cosine of this angle gives us a stable, efficient similarity measure ranging from -1 to 1:

  • cos(0°) = 1 (vectors point same direction)
  • cos(90°) = 0 (vectors are perpendicular)
  • cos(180°) = -1 (vectors point opposite directions)

The cosine similarity formula is shown below, where a and b are vectors with components a₁, a₂, etc. and b₁, b₂, etc., θ is the angle between them, · represents the dot product, and ||a|| denotes the length (magnitude) of vector a:

cos(θ) = (a·b)/(||a||·||b||)
       = (a₁b₁ + a₂b₂ + ...)/(√(a₁² + a₂² + ...) · √(b₁² + b₂² + ...))

However, when vectors are normalized (meaning their length equals 1), the denominator becomes 1·1 = 1, leaving us with just the dot product:

cos(θ) = a·b = a₁b₁ + a₂b₂ + a₃b₃ + ...

This is why working with normalized vectors is so efficient – the similarity calculation reduces to simply multiplying corresponding elements and adding up the results.

Let’s compute the similarities between our example vectors:

cos(θ₁) = A·B = (0.7 × 0.5) + (0.71 × 0.866) = 0.964 // Similar!
cos(θ₂) = A·C = (0.7 × -0.8) + (0.71 × 0.6) = -0.134 // Quite different

The first high value (0.964) tells us vectors A and B point in similar directions – just as we’d expect for two phrases about cats! The second value (-0.254) shows that vectors A and C point in quite different directions, indicating unrelated topics.

relevance = query_vector · document_vector
= q₁d₁ + q₂d₂ + q₃d₃ + ...

This seemingly simple calculation – just multiplications and additions – is the key to private search.

Via X

Leave a Reply

Your email address will not be published. Required fields are marked *

More Technology Knowledge Updates…