Data Collection of Diseases
Data collection is defined as the process of collecting, analyzing, and interpreting different types of information relating to a particular disease or healthcare need. Traditional patient records are collected from sources like personal surveys, handwritten prescriptions, and hardcopies of the patient's records from local hospitals. Prior to the evolvement of digital data, the healthcare records come in physical form. Thus, the data are collected and managed within the hospital itself. But following the recent IT advancements, patient records are collected in a digital format (Kaur and Siri 2006). Some of the examples of digital data used in the field of medicine are digital scan reports, videos shot on laparoscopic cameras, digital X-ray reports, endoscopy videos, and ultrasonic records. These medical data are fast growing data in the digital world. As per the survey conducted by DELL EMC. (2018) the healthcare data growth rate has increased by 878% since 2016. It also claims that the total amount of healthcare data will have reached 20,000 petabytes by 2020. In addition to that, more healthcare applications and databases are developed every day that work with healthcare data.
An important source of electronic medical records are electronic health (eHealth) devices and communication-supported health devices. These collect data at frequent intervals from patients though eHealth devices and store it in cloud storage. If the data are collected from patients through electronic devices directly, then the data are called Patient-Generated Healthcare Data (PGHD). The Cloud Service Provider (CSP) maintains the patient's clinical data like demographics, progress notes, problems, and medications on cloud storage. Patients' medical records are digitalized and assist in ensuring data is accurate. Electronic Medical Records (EMR) data collection can be classified as both quantitative and qualitative data collection. In quantitative data collection, the data are collected in the form of numeric variables. In other words, the information is collected from the patient as numeric values, such as count, number, and percentage. Qualitative data collection methods collect patient data in a non-numeric fashion. This type of data is collected through methods of observation, one-to-one interviews, and online surveys. Qualitative data are also known as categorical data.
The important ways of collecting the Electronic Medical Records (EMR) are eHealth devices, semantic data collection, and patient chatbots.
EMR Data Collection through eHealth Devices
eHealth devices are also called self-monitoring healthcare devices. They use sensors and wireless communication design to measure the patient's health and transfer it to cloud storage. This allows the patient, as well as the physicians, to measure and monitor the patient's health remotely. Some of the available healthcare monitoring eHealth devices in the market are temperature devices, heartbeat tracking devices, glucometers, oximeters, pulsometers, and blood pressure devices. These IoT-based healthcare devices are considered to be an important advancement in the field of healthcare management. As the use of cloud computing and wireless technology increases, the demand for eHealth devices is also rapidly growing. It is predicted that in 2020, eHealth devices will account for 80% of wireless devices. The main advantage of these devices are mobility and accessibility of smartphones and tablets.
Semantic Data Extraction from Healthcare Websites
Semantic extraction of healthcare information extracts information related to a particular disease, medical facts, attributes from a website, or unstructured data. The purpose of semantic data extraction in healthcare is to enable analysis of the unstructured content, electronic prescriptions, medical text documents, emails, digital images, and patient reports. The main objective of semantic analysis is to structure the unstructured data (Wu et al. 2018).
Semantic data extraction on websites has two major approaches: rule matching data collection and machine learning data collection
Rule matching data collection: this collects the information related to a particular word or phrase from websites. A rule-based matching algorithm is used on raw medical websites to gather the information about a particular disease. They also provide access to the tokens within the document and their relationships.
Machine learning-based data collection: this is a statistical analysis of the content, the potential compute-intensive application that can benefit from using Hadoop. This approach derives the relationship from statistical co-occurrence within the website.
To deliver quality services to the patients, medical informatics entities are using recent technologies like Artificial Intelligence and predictive technologies in the healthcare application. It is impossible for a patient to get advice from physicians in an emergency situation. To provide "round the clock" medical advice to patients, healthcare industries are investing a lot in the creation of automated medical chatbots. Medical chatbots are conversational software available for smartphone applications. They provide a more immediate service for patients. They are adequate enough to communicate and gather information from the patients. The collected information is fed to the deep learning algorithms to improve the intelligence of the chatbots. These medical chatbots are a recent trend in the healthcare industry. Some of the most popular chatbots are related to the healthcare industry.
Medical data exists in different forms such as laboratory test results, notes by physicians, lifestyle data of patients, vital signs, and various forms of imagery data such as Magnetic Resonance Imaging (MRI), radiology, ultrasonography, pathology slides, etc. There is no proper standard for encompassing the medical data, hence it is important to understand the information of the data before processing it.
Structured data are organized and consistent in nature. Structured data can be analyzed easily. A few examples of medical data include numerical values such as blood pressure, height and weight, and categorical values, such as blood type, diagnostic stage of disease, etc. It is a non-homogenous and non-monolithic category as the data will be in structured form and it doesn't mean that it makes sense with the data as it is in structured form. Furthermore, we cannot say that the data with no formal structure cannot be interrupted easily.
Consistency and Quality of Structured Data
The structured data consists of two main parts - the value and the variable name. Consider the height of patients. In electronic medical records of patients, the height of the patient might be stored as "height: 64". This depicts the height of the patient in inches. It is also possible to store the value in meters as "height: 1.625m" or it might be stored in terms of yards as "1.77 yards" and so on. The variable might also be stored in different forms.
Logical Observation Identifiers Names and Codes (LOINC) is a universal and database standard developed for identifying the medical laboratory reports. Health Level 7 is an international standard for transferring administrative and clinical data between the application which are used by various medical providers. The Fast Healthcare Interoperability Resources (FHIR), is a drafting standard for elements, data formats, and API for exchanging the medical data.
Usually structured patient generated data is collected from devices that are held with the consumers and it may not be an FDA approved device, hence the data from these devices cannot be compared to each other although they are in structured format. For example, when using an accelerometer the number of steps walked by a consumer is measured but there is no standard algorithm for converting this raw data. Though this data is inconsistent, the clinician still uses it to find the relative improvement of patient for a period of time, conditionally the patient should use the same electronic device for the entire period. But the direct patient comparison wiould be implausible. To overcome these issues standardized devices should be used.