- Problem: You want to store a file of sensitive information on a cloud server
- You don't want anyone in the cloud to read the file contents
- Solution: Encrypt the file; store the file in the cloud
- Problem: But the file is actually a very large database
- You want the cloud server to carry out queries on the database and return only the relevant records
- Example: Medical information
Name Date Wgt Sys Dia Pul Glu
----------------------------------------
Bob 1/1/2011 160 120 80 86 145
Carol 2/2/2011 120 118 81 94 110
Ted 3/3/2011 330 184 103 98 222
Alice 4/4/2011 140 115 76 82 101
Bob 5/5/2011 165 120 80 86 150
Carol 6/6/2011 119 118 81 94 107
Ted 7/7/2011 340 184 103 98 245
Alice 8/8/2011 135 115 76 82 103
. . .
- Problem: You don't want the cloud server to know who each record is about -- private information
- Solution: Store the one-way hashes of the database keys (names), not the database keys themselves
- Client can query by specifying the hash of the desired name
- Cloud can't determine the actual name from the hash (preimage resistant)
SHA-1(Name) Date Wgt Sys Dia Pul Glu
---------------------------------------------------------------------------
a4380269bf9d4679ba39fed110b70a25ee78c457 1/1/2011 160 120 80 86 145
0cec26df63beba9d92796eb399775ae9c7693e86 2/2/2011 120 118 81 94 110
26fd617c74c7af144d2a094a3ddf83cb703373a1 3/3/2011 330 184 103 98 222
915d944fa078db55496e637522f61a6763a318c7 4/4/2011 140 115 76 82 101
a4380269bf9d4679ba39fed110b70a25ee78c457 5/5/2011 165 120 80 86 150
0cec26df63beba9d92796eb399775ae9c7693e86 6/6/2011 119 118 81 94 107
26fd617c74c7af144d2a094a3ddf83cb703373a1 7/7/2011 340 184 103 98 245
915d944fa078db55496e637522f61a6763a318c7 8/8/2011 135 115 76 82 103
. . .
- Problem: But anyone can still look up Bob's records simply by querying with key = SHA-1("Bob")
- Solution: Use a keyed hash function, i.e., a MAC, with a secret MAC key known only to the client
- Client can query by specifying the MAC of the desired name
- Cloud can't determine the actual name from the MAC (preimage resistant)
- Cloud can't troll for data on specific names (MAC key is secret)
HMAC-SHA-1(Name,MAC-key) Date Wgt Sys Dia Pul Glu
---------------------------------------------------------------------------
5e2a92841025963ed27faac1beec172c37e74006 1/1/2011 160 120 80 86 145
ccd3a41ce33cf4d53891e5e822cf30de56d186ea 2/2/2011 120 118 81 94 110
5e59ce29123598164ff8acd924046dbd5b157836 3/3/2011 330 184 103 98 222
99347b93d9522b73928cf75b16984490bffc25b4 4/4/2011 140 115 76 82 101
5e2a92841025963ed27faac1beec172c37e74006 5/5/2011 165 120 80 86 150
ccd3a41ce33cf4d53891e5e822cf30de56d186ea 6/6/2011 119 118 81 94 107
5e59ce29123598164ff8acd924046dbd5b157836 7/7/2011 340 184 103 98 245
99347b93d9522b73928cf75b16984490bffc25b4 8/8/2011 135 115 76 82 103
. . .
- Problem: The cloud might accidentally or maliciously alter the stored data
- Solution: Include a MAC of the entire record as an additional column
- Any tampering will be detected
- Not much we can do if the cloud alters the key field or deletes the entire record
- Need redundant copies of the database to deal with that
- Problem: The cloud can't see the actual name, but it can see the other data
- Solution: Encrypt each record (except the first column), using a secret encryption key known only to the client
- Either encrypt each column separately
- Or encrypt the whole record into one binary "blob"
- Include the MAC inside the encryption, to detect tampering
- We can still use the cloud's enormous database retrieval power to query for (encrypted) records based on (MACs of) names
- Decryption of the records takes place in the client, not in the cloud
- There is a large and growing body of research on the topic of Private Information Retrieval
- One interesting example:
- E. Blass, R. Di Pietro, R. Molva, and M. Onen. PRISM -- privacy-preserving search in MapReduce. Cryptology ePrint Archive, Report 2011/244, May 9, 2012. http://eprint.iacr.org/2011/244
- Given a group of large files, containing encrypted words, stored in the cloud . . .
- Determine which file(s) contain a given target word . . .
- By running a map-reduce job over the files in the cloud . . .
- Without the cloud learning what the words in the files are (data privacy) . . .
- And without the cloud learning what the target word is (query privacy, of input) . . .
- And without the cloud learning which file(s) contain the target word (query privacy, of output)
- For further information and case studies:
- P. Wayner. Translucent Databases. Flyzone Press, 2002.
- Problem: We might want the cloud to do more than just store and query databases
- We might want to do computations in the cloud as well
- But we still don't want the cloud to breach the privacy of the data
- Solution . . .