Privacy in Cloud Healthcare Data
- Cloud Computing in the Healthcare Sector
- Pay as You Use
- Massive Scalability
- Resource Sharing
- Large Storage Space
- Cloud Infrastructure
- Data Life Cycle
- Major Data Privacy Challenges/Concerns
- Scalable Privacy Preserving Data Mining and Analysis
- Granular Access Control
- Cryptographically Enforced Data-Centric Security
- Balancing between Data Provenance and Security
Data privacy refers to the rightful ownership of one’s personal information from the time the datawere generated till they are destroyed. In today’s rapidly evolving era of medical science,the privacy and security of EHRs are given the most importance since digital data, which are preferably stored in a cloud environment, attract various hackers and intruders . Medical data involve a wide area of data owners such as mobile applications tracking a person’s diabetes level or blood pressure, meditation apps to help relieve people’s stress, menstrual cycle apps for women to track their menstruation timetable, reminders on phones for taking a particular medicine, water intake reminders and the list is not limited . From getting up in the morning to going to bed at night, trillions of bytes of medical data are generated and processed by an individual, which need to be given privacy from the outside world, since they contain highly sensitive and personal information . According to Padmaja et al. , attacks on the healthcare domain take approximately 11 hours to mitigate, and cost an average of $2800. Snooping attacks constitute the major area of invasion of privacy of the personal data of users.
Healthcare data include various digital health records as mentioned in Figure 2.1, and is not limited to these domains. Sincehealthcare data are stored in cloud servers, let us briefly discuss the cloud computing paradigm with its advantages and disadvantages .
Cloud Computing in the Healthcare Sector
Cloud computing is a network of systems/servers that are connected together by means of the Internet to provide five major characteristics to its users:Pay as you use, massive scalability, resource sharing, elasticity and large storage space .
Pay as You Use
The major benefit given to the users of a cloud is “pay as long as you use the service.” The users have the liberty to pay only for those resources which are required for a given period of time, without being worried about the extra incurred cost for underutilized resources [6, 7].
Since the user of a cloud service can be an individual or an organization with thousands of employees, the cloud environment can scale up/down its services to cater to the needs of the users in any situation . It can provide the necessary network bandwidth along with large storage space as required.
Also termed multitenancy,cloud computing offers the sharing of resources among different users at the same time:At the network level, host level and the application level [4, 8].
Cloud service users have the provision to release their claimed resources when no longer needed, to make them available for other users .
Large Storage Space
Users can ask for TBs of storage space from cloud service providers and pay on a per-use basis .
Cloud computing is based on the concept of virtualization where computing resources, storage and the network are virtualized to provide the benefits of optimized utilization of IT infrastructure, decreased management complexity and deployment time of services.
The cloud infrastructure is shown in Figure 2.2.
FIGURE 2.3 Cloud services (IaaS, PaaS, SaaS).
The three main services offered by cloud computing as shown in Figure 2.3 are infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS).
Data Life Cycle
As shown in Figure 2.4, the privacy of a user’s data should be maintained at every stage of the data life cycle, that is, from the point of creation of the data till their
FIGURE 2.4 Data life cycle.
destruction phase [10, 11]. Each phase consists of various components which need to be addressed to ensure the sustaining of data privacy. Let’s discuss these components briefly with respect to healthcare organizations.
a. Creation of data:
- • Data possession : Who is the owner of the sensitive information in the organization and after it is sharedover the cloud servers?
- • Data categorization: How are data classified in the organization to determine which data need to be sent to the cloud in contrast to sensitive information (e.g., personally identifiable information) which needs to be kept inside the organization?
b. Processing of data:
- • Data authentication: Are proper authentication techniques used to ensure rightful access to organization data (e.g.,company’s verified login credentials/biometrics/OTPs, etc.)?
- • Data relevance: To ensure whether the data are being used for the purpose for which they were created and are notmisused after being deployed over the cloud.
- • Inter or intra data usage: Whether the data are used solely by the organization that created the dataor are being used by other organizations via a public cloud (e.g.,insurance companies, private pathology labs).
c. Transfer of data:
- • Cryptographic mechanism: When data travel from the organization server to the cloud server and vice-versa, are the data protected using any encryption mechanism to avoid any intruder attack?
- • Use of the private/public network: Does the organization use a public or private network to transfer the sensitive data over the cloud? d. Data modification:
- • Data integrity. Is the data integrity maintained when the data are transmitted over the cloud and users start to access the data from the cloud after exploiting the services offered by the CSP (IaaS, PaaS, SaaS)?
- • Data originality. The processing of data generates new results and insights fromexisting data; therefore it is essential to ensure that the originality of the stored data is maintained after processing.
e. Data storage (data at rest):
- • Dataconfidentiality: How is data confidentiality achieved in the cloud to avoid privacy attacks like data sniffing of stored data ?
- • Data integrity: How is data integrity achieved in the cloud? Is the use of digital signatures, hashing functions, etc., implemented for sensitive information [12, 13]?
- • Data availability: Ensuring that appropriate data are available at all times to the authentic users of the organization .
- • Data authorization: To ensure that only authorized users can gain access to the system and appropriate access controls(right to read, update, edit, etc.) be issued to every employee to limit the unauthorized access .
f. Data archival:
- • Time period: To determine the time period for which the data need to be archived in cloud storage
- • Future use of data: To determine if the stored data will be useful in the near future, and to determine the worthiness of archived data in terms of storage cost and maintenance cost.
g. Data destruction:
- • Complete: To ensure that the complete destruction of data is achieved and to determine whether the data can be recovered in case of mistaken destruction of data.
- • Data remanence: To ensure that the residual data do not pose any threat to the existing data and are kept in safe storage .
Major Data Privacy Challenges/Concerns
The privacy challenges faced in a cloud environment are broadly classified into four groups:Let us discuss each of the challenges in detail [15, 16].
Scalable Privacy Preserving Data Mining and Analysis
With the advent of big data, legacy systems were unable to handle the storage and analysis of TBs of data. This called for powerful servers, which can help the processing of data analytics of big data in large organizations for better decision making, market research, increased data consumption, analysing and predicting profits or losses, etc. [17, 18].
In the healthcare sector, data analytics and market research are used by insurance agencies to study the pattern of claims submitted by customers or to study demographics of people (smoking habits, older age, large families, etc.) to acquire new and more customers [18, 19]. Pharmaceutical companies use analytics to predictdisease outbreaks in the future, to invent vaccinations for pandemics, etc.
Healthcare agencies possess personally identifiable information (PII) and protected healthcare information (PHI) of doctors, patients, nurses and other stakeholders . Using extensive market research on the data can lead to privacy invasions and breaches of data security, which are the two major concerns related to sensitive information . The attacker or malicious intruder can gain access to PII data which can result in serious threats to privacy. Keeping the data anonymous does not help in the healthcare sector.
Hence the major scenarios responsible for invasions of privacy include:
- a) An organization using third parties for market research needs to release critical information which may lead to the possibility of an attack by an outsider and the misuse of private information of persons .
- b) Organizations using a cloud server for remote storage and analysis are at greater risk since the ownership lies in the hands of the cloud service provider (CSP), which is prone to disclosure of data [21, 22].
- c) An insider employee of an organization can also be a potential attacker, who can misuse his access rights to gain access to sensitive information of clients for personal benefit.
Many solutions have been proposed to face these challenges such as:
- • Differential privacy. Using this technique, the organization shares the information about a particular dataset publicly, by making groups of data and sharing patterns. However, the organizations do not disclose the PII information and keep them secure. This is shown in Figure 2.5.
- • Homomorphic encryption. Using this technique, the organization shares the information with analytics agencies in encrypted form. The data analytics are performed on the ciphertext only and the results are sent back to the organization in encrypted form as shown in Figure 2.6.
FIGURE 2.6 Homomorphic encryption.
Granular Access Control
Granular access control corresponds to the allocating of access rights to the organization’semployees for each and every part of the system and also stating “what” can be done using those access rights . However, issuingthe access rights for each employee as well as for each part of the system is a difficult and time- consuming task.
Granting access control can be discussed using the four Ws of the process.
- • This refers to the identification of each employee’s role and granting them appropriate access control, which requires a great deal of time and effort. For example, a system administrator should be given the rights to access the database of the company; however a web designer need not be given these rights .
- • Giving controls for each part of the systeminvites challenges for the roles distributor,since an organization contains diverse data with respect to structural aspects of data or varying security concerns linked to each type of data.
- • “What”
The ‘What’ of the process refers to what kind of access controls are to be allocated to each person in a company.
- • For instance, in an insurance-based company, read, write and access rights  are given to the claims department for analysing the claims submitted by the customers, while the marketing department is given only the reading right for the same.
- • However, giving such explicit rights to each individual is a challenging task.
- • “Where”
- • The granting of access rights to individuals and for each part of the system is not sufficient; rather the location from which an employee accesses the company resources plays an important role.
- • Employees logging into the system from a remote location can be a major threat to privacy and a breach in security. Hence, access rights should be given to only on-site employees and, if needed, may be granted for the locations which are in close proximity to the employee’s residence .
- • “When”
- • It is very crucial to decide the “time duration” for which the rights are given to the individuals. For instance, a full-time employee will require the rights from 9 am to 8 pm every working day, while a part-time employee will require rights for only 3-4 hours daily as long as the contract stays valid .
- • Any employee w'ho w'ishes to access the system outside of his working hours can be considered as a possible threat to the company. However, under special conditions, the issuing of access rights can be granted, which must be terminated after the job is completed.
Cryptographically Enforced Data-Centric Security
In order to maintain privacy of cloud users’ data, there are two main approaches used. One is to minimize the visibility of the cloud servers bylimiting the access control rights, and the other is a data-centric approach, which focuses on encrypting the data from end to end at the storage level and during transmission in a network.
The first approach is simpler to execute; however it is prone to security attack by malicious users, who can misuse the unprotected data.
The second approach limits the attack area, but is still prone to attacks by adversaries .
Since healthcare data are highly confidential and vast in nature, users often store them in a cloud in encrypted form. As soon as the data are stored in the cloud, they are not in the sole control of the users. In order to perform any processing of the data, they need to be decrypted first, which causes the main loophole for the attackers to gain access to the data.
Data encryption alone cannot protect the data since there is always a risk of data modification and data leakage by mistrusted users in the network [17, 28]. Man- in-the-middle attacks, data sniffingand spoofing of data need to be dealt with and therefore,maintaining the confidentiality and integrity of data are the main concerns which need to be addressed for the data stored in a cloud server.
Balancing between Data Provenance and Security
Data provenance or data lineage, put in simple words, means “metadata” which describes the data about the data.lt contains all the information about the data: Origin/source of data, travel history of data, who accessed the data, what changes have been made to the data from their inception till date and who is responsible for making the changes.
The provenance of data is critical to organizations where data integrity and confidentiality are of prime concern . Insurance-based companies, the healthcare sector, supply chains and departments focusing on scientific research and development are deeply dependent on historical data records. In the absence of such records, serious security threats can hamper these types of companies.
Many companies are now aware of the importance of evaluating their data provenance through the process of backtracking. When an unexpected error occurs in a system, the general rule is to backtrack to the origin of the error, what caused it and what can be done to erase the problem from its root. This methodology is also practiced by the organizations to gather the provenance of data [24. 29].
Since this metadata contains all the important information about the company’s data, it is prone to insider and outsider attacks. An insider of a company can manipulate, delete or forge the provenance records for personal benefit, thus invading the privacy of the user’s data. Similarly an outsider can gain unauthorized access to the company’s system to delete or change the records and can access the user’s personal records .