Cloud storage is continuing its growth curve, but the big issue behind using and relying upon it is security.
That is where Wensheng Zhang, an associate professor of computer science at Iowa State University, comes in because he and his team are working to defend against the cyber risk.
Zhang said cloud users can always encrypt sensitive data and information, but how they access the data may make it vulnerable.
Reports of access pattern-based attacks to cloud storage are rare, Zhang said. Phishing attacks – including a hack targeting professors and researchers – are the most common. A 2017 Google study identified as many as 12.4 million potential victims of phishing over the course of a year. However, if hackers can crack the data storage service, Zhang said it is only a matter of time before they try to exploit data access patterns.
“Cloud storage is very convenient, but there are privacy risks,” he said. “This kind of threat may be of greater concern to companies or agencies working with very sensitive data. For example, military agencies or some branches of the government.”
Here is perfect case of the threat Zhang is working to prevent: An agency uploads a large dataset to its cloud account. A team analyzing a specific subset of the data regularly accesses the information, creating a pattern. Someone – a rogue employee or hacker who compromised the cloud service – could observe the pattern and make assumptions about the data.
The idea may seem a little farfetched for the average person who uses the cloud to store photos or less sensitive information, but a user storing classified documents or research results in the cloud may feel differently. Zhang said if an agency makes a major decision after accessing that subset of data, hackers can infer the value and focus their efforts on that section, rather than trying to crack the entire file.
Developing the technology to disguise access patterns is technical and complex work. Zhang said the basic premise is to create an algorithm that incorporates a mix of fake and real access requests, making it difficult to detect a pattern. It sounds simple, but time and cost are two barriers. It needs to be efficient so the fake access does not delay work or cost too much (bandwidth limitations and cloud service fees), he said.
Zhang and his team detailed one technique they said is one of the most efficient algorithms proposed for protecting the data access pattern.
The work is ongoing as the team looks for ways to improve performance and efficiency. Zhang said they are also exploring the pros and cons of splitting large datasets across multiple providers, so access patterns do not reveal the full picture.
“Storage is now more affordable. Five years ago, it was expensive to buy a computer with several hundred gigabytes of storage, but today it is very common,” Zhang said. “If users are concerned about privacy, they can keep a small subset of data locally and export the remaining dataset to storage, which can save some cost for protecting the access pattern privacy.”