Management Methods and Software Tools

In this section, we give a brief description of some of the well-known methods/tools that are currently being used for our project and file management. We will describe how we utilize these tools to develop a sample data/project management framework. Please keep in mind that the framework we will describe in the upcoming section (Section 2.6) of this chapter can (and should) be modified to meet the specific needs of each research team for different projects.

There are multiple project management tools but, in this section, brief descriptions of the three most popular ones are presented. We should emphasize that although the latter two software tools are mainly used for creating a platform to enhance collaboration work and sharing files and content between teams and among team members (and hence not considered pure project management software), they have enough capability to be used as a project management framework.

  • • Microsoft Project: This tool is a project management software designed to run on local computers. Microsoft Project is a well- known project management software and has some advanced features. However, it is designed to work on local computers, it is not recommended for EHR project management. EHR data research teams are usually big and some team members need to be able to work remotely.
  • • Slack for Project Management: Slack is a cloud-based project management/collaboration software. Currently, it is the second most common project management/collaboration software available. The ability to share and work on files by the team members, messaging, group chat, and video conferencing are the major selling point of this software. Setting up Slack is easier in comparison with our next project management software.
  • Microsoft Teams: This cloud-based project management/collaboration tool is the most common one. The features of Microsoft Teams are similar to Slack, but it has fewer limitations. For example, the number of users who can participate in a video conference is higher in Microsoft Teams in comparison with Slack. Furthermore, it is compatible with other Microsoft products such as Microsoft Office package and a growing number of add-ons can be utilized within Microsoft Teams. As a result, Microsoft Teams is a more favorable project management/collaboration tool for larger organizations.

Similar to the argument discussed in the last section, if it is mandated by the EHR data owner, team members should refrain from sharing patient data files on any cloud-based storage server (such as Slack or Microsoft Teams). Sharing the reports of a group of patients or papers is usually acceptable, but team leaders should be familiar with the detail of the contract with the EHR Data owner to prevent legal and ethical issues.

For a single project, the selection of project management software is usually mandated by the organization. Most of the time, purchasing a new project management software or tool for one or a small number of projects is not time and cost-efficient (keep in mind that with new software comes the training of the team members). As a result, the choice of project management software is very limited. If the project manager has the freedom of choosing the project management software/tool, Microsoft Teams can be a good choice because of the above-mentioned reasons.

An Example of a Data Management Framework

In the remainder of this chapter, we describe a data management framework that is being used by our team at the School of Public Health, University of Texas Health Science Center at Houston. We have multiple teams working on different EHR research projects and other Big Data projects (e.g., Cerner, UK Biobank, GEO, etc.). Each team has focused on one or more cohorts of patients with particular diseases (e.g., Subarachnoid hemorrhage (SAH) project or HIV /AIDS project based on the Cerner EHR database) and each project has one or more sub-projects (e.g., vasopressor comparison and mortality prediction subprojects within the SAH project).

The SAH cohort consists of all patients in the EHR database who had a history of Subarachnoid Hemorrhage (SAH) diagnosis between 2000 and 2018 and includes around 50,000 patients. The AIDS cohort consists of all patients in the EHR database who had a history HIV positive test for the same period and includes around 100,000 patients.

In this section, we will describe a data management framework based on our experience on these ongoing EHR research projects. We assume that your data management team already set up a reliable data loss prevention structure and your organization has a collaboration/project management tool ready. We begin with a proposed folder (directory in non-windows operating systems) structure and we will set a convention on how to name the folders and files.

Folder Management

The following naming and folder structure convention should be considered as an example. Each organization, depending on their needs, should create its file and folder structure and naming convention. The following rules can be a good starting point.

Naming

There are two main methods for the general format of your files and folder names. The first method is to separate_each_word_with_under- score, the second method is to use CamelCaseTheFirstLetter. The first method is usually recommended because it makes the name more readable. In this framework, we will use the first method. In the proposed method, each word in the folder name should begin with a capital letter followed by lower case letters (except for "and", "of", etc.). The acceptable characters are case sensitive alphanumeric (A-Z, a-z, 0-9) and (underscore).

Avoid using "" (space) in the name since it may lead to errors in some codes or software. The use of abbreviated words will reduce the length of the names and is highly recommended. Do not use the date as part of the folder name, because it is usually not meaningful and if it is necessary, it can be added separately in the data management framework that we will describe later. The following examples show some of the acceptable and unacceptable names for folders:

  • • SAH, Cerner, UK_Biobank, HT_and_HF: Acceptable and recommended.
  • • Hypertension_and_Heart_Failure, SubarachnoidHemorrhage: Acceptable but not recommended.
  • • UK Biobank, SAH_20190128, CernerS: Unacceptable.

Structure

You should create a folder to include all of the files for your unit or research center. For security and maintenance reasons, it is better not to include the actual database names in these folders. In this example, the main folder that contains all the data is named CBD_HS, which indicates our Center for Big Data in Health Sciences (CBD-HS). The general structure of the sub-folders is presented in Figure 2.3. Except for the end folders, other folders should not contain files.

An example of the folder structure for the SAH project is presented in Figure 2.4.

Figure 2.4 shows part of the mortality prediction sub-project from the SAH project and its main up and down sub-folders. In the following part, the details of each type of sub-folders are presented.

In this framework, we define four user groups. The details of each user group described in later sections but the access level is described alongside each folder description. These user groups are:

  • • Super_admin.
  • • Admin.

FIGURE 2.3

The general structure of the folders.

FIGURE 2.4

Part of the SAH project and its main up and down sub-folders.

  • • Project_manager.
  • • Regular_user.

Main Folders

CBD_HS This is the main folder that contains all project data.

  • • User groups with read/execute access: Super_admin, Admin, Project_manager, Regular_user.
  • • User groups with write access: Super_admin, Admin.

Each working group or team has its folder in the CBD_HS. Only admins can create a new working group (by creating a new folder). Project managers have to write permission for the contents of their corresponding working group, but they cannot create a new working group.

Each working group or team should have at least one main project and each main project should have at least one sub-project. For instance, in Figure 2.5, we created a new research working group called "Brownsville". We are in the exploratory phase of this study. To keep the correct structure for our CBD_HS folder, we create a folder named "Exploration" as the main

FIGURE 2.5

Adding a new workgroup or team to the folder structure.

project and create another folder named "Explore_l" inside of it. It is crucial to keep this convention in creating the tree of folders, otherwise, our data management framework may miscategorize them.

Public_Folder Project managers do not have access to the individual user folder, so the public_folder acts as a shared media for all working groups or research teams. The users can store the files they want to place in the CBD_HS folder and inform their corresponding project manager, then the project manager can move these files to their appropriate location. It is recommended to perform these procedures as soon as possible so the public_folder does not get crowded by files. Alternatively, you may want to use your server's shared folder for this purpose.

  • • User groups with read/execute access: Super_admin, Admin, Project_manager, Regular_user.
  • • User groups with write access: Super_admin, Admin, Project_ manager, Regular_user.

Admin This folder is for administrator:

  • • User groups with read access: Super_admin, Admin.
  • • User groups with write access: Super_admin, Admin.

Admin folder contains three sub-folders (Figure 2.6):

1. The data management framework (DMF). This folder contains:

a. The DMF core.

b. Archived sub-folder to store previous versions of DMF core.

2. Network. This folder contains:

a. Current network status.

b. Current network status graph object.

c. Archived sub-folder to store previous versions of network structure and status.

3. Manual. This folder contains:

a. The manual for data management framework set up and use for admins.

b. Extras sub-folder for the images and extra files used in the manual.

c. Archived sub-folder to store previous versions of manuals.

Useful_Info This folder is created to share the educational and other materials such as ICD-9 or ICD-10 dictionaries that are useful for all of the working groups or teams.

  • • User groups with read/execute access: Super_admin, Admin, Project_manager, Regular_user.
  • • User groups with write access: Super_admin, Admin.

DMF will not map the contents of this folder. If any of the files in this folder is used for any of the projects, a copy of the file should be placed in the "CBD_HS/Working_Group/Project/Sub_Project/Raw_Data/Extras/" folder and use this path to address the dependencies. The concept of dependencies and DMF will be discussed later.

Group Folders

One folder should be created for each research group or team (e.g., Cerner, UK_Biobank). These folders contain all of the individual projects and data.

  • • User groups with read/execute access: Super_admin, Admin, corresponding Project_manager.
  • • User groups with write access: Super_admin, Admin.

Project Folders

This folder is for each project in the working group. It contains at least one sub-project and the Main_Raw_File folder to store the data extracted from the database. This main raw file is shared between all sub-projects.

• User groups with read access: Super_admin, Admin, corresponding Project_manager, users of the same project.

• User groups with write access: Super_admin/ Admin, corresponding Project_manager.

Sub_Project Folders

  • • User groups with read access: Super_admin, Admin, corresponding Project_manager, users of the same project.
  • • User groups with write access: Super_admin, Admin, corresponding Project_manager.

The following sub-folders should be created for each sub-project (the numbers at the beginning of folders are added to show the stepwise process of the data):

1. 01_Raw_Data:

a. It contains multiple folders for each category of the extracted raw data. It is recommended to create separate folders for different categories of the raw data (they can be empty) as a reminder of the tables that may need to get extracted data from the database and also to make it suitable for future expansions.

If the sub-project is using the Main_Raw_Data, these folders will be empty. But if the sub-project needs its specific data, then it should be placed in these folders to prevent confusion between multiple sub-projects. Even if there is only one sub- project within the main project, the same rule should be applied because of the possibility of adding other sub-projects in the future. Each of these categories should have an Archived sub-folder to store the previous versions (Figure 2.6).

FIGURE 2.6

Admin folder structure.

b. Extras sub-folder to store miscellaneous files or files copied from Useful_Info main folder.

2. 02_Cleaned_Data:

a. Same as Raw_Data contains multiple folders for each category of the cleaned data.

b. Extras sub-folder to store miscellaneous files.

3. 03_Prepared_Data:

a. Same as Raw_Data, Contains multiple folders for each category of the prepared data. If the file is prepared by using multiple cleaned or raw files and contains information about multiple categories of data, the corresponding project manager should judge what is the category of the majority of the fields and place it in the appropriate folder. It is better to avoid creating new folders or placing them in the "Extras" folder.

b. Extras sub-folder to store miscellaneous files.

4. Reports:

a. It contains multiple sub-folders for each report. Each of these folders should have an Archived sub-folder to store the previous versions.

b. Extras sub-folder to store miscellaneous files.

5. Papers:

a. It contains multiple sub-folders for each paper. Each of these folders should have an Archived sub-folder to store the previous versions.

b. Extras sub-folder to store miscellaneous files (Figure 2.7).

 
Source
< Prev   CONTENTS   Source   Next >