Discussion and Summary

It is extremely important to efficiently manage the data, computer codes, files, reports and products for each of the EHR research projects so that the resources can be shared and leveraged, and the reproducibility of research results can be ensured. We summarize some tips for EHR project management as follows.

Secure your data. Take necessary measures to secure your data by the use of backups, Version Control Systems, and Antivirus and firewall hardware and software.

Develop the conventions early. As we have discussed in this chapter, it is better to develop file, folder, and coding conventions early. Reorganizing the files and rewriting the codes after you have collected a significant number of files can be a challenging task.

Seek team members' opinions about the conventions. Every data scientist has his/her own habits in storing the files and coding (hence, some have messier habits). Adapting to new routines in coding, naming the variables, and naming of the files and folders can be hard for some and they may become hesitant to do so. It is important to involve as many team members as you can in developing the conventions at the beginning of a project. By doing so, first, you can develop a better convention, and second, adopting the new convention will be easier for your team members.

Follow the convention. After finalizing the convention, everyone must follow it. If some members do not follow the convention, it can be discouraging for other team members. Keep in mind, any project and data management framework relies on a pre-defined convention and creating unconventional files or folders can cause your project management framework to operate under sub-optimal conditions, or even worse, damage the project.

Try to use OOP. As we have discussed, try to use Object-Oriented Programming. It is easier to use object-oriented codes in the project and data management framework.

Take Access Management seriously. EHR data is sensitive. It is strongly recommended to restrict the access of users to the files they do not need. Furthermore, by keeping the write privilege only for project managers and administrators, you ensure that the data structure follows the finalized file and folder convention.

Regularly check the file and folder structure. The administrator should check the project manager's ability to follow the convention. If a project manager has misunderstood the convention, it is better to fix it early before it becomes more prevalent.

Check the codes rigorously. Check all of the codes and make sure they will not change the data in an unwanted manner. Pay special attention to your DMF core and test it under different scenarios. If the DMF core makes errors, you may lose part or all of your data, and VCS or backups may become the only method to restore the data. Try to optimize the codes as much as possible.

Make sure you have enough information. You will frequently need to re-run the codes. Having a clear description of the codes and files will be extremely helpful when you need to come back and cannot remember what you have done before. EHR research teams are usually big and staff turnover is inevitable. By keeping detailed and well-organized documentation, you will save time and prevent unnecessary challenges.

This is an example. The Data Management Framework described above in this chapter is just an example and each organization may need to change this framework or even create their own from the beginning to fit their projects. The methods described in this chapter should be considered as a groundwork to create a more detailed and sophisticated Data Management Framework (DMF).

Project and data management are important issues especially in this era of Big Data. As the scope of big data research changes over time, the need for a better Data Management Framework becomes more evident. In the future, we may update this chapter to reflect the new challenges and present newly developed strategies.

Appendix--File Submission Form

Center for Big Data in Health Sciences

File Submission Form version 1.1

Title: Cleaned Medication Data for Mortality Prediction Sub-project,

SAH Project

Main file:

  • • Filename: SAH_Mortality_Meds_20200210.csv
  • • File date: 2/10/2020
  • • Type of file

О Raw Data <® Cleaned Data O' Prepared Data r Report Paper о Software Other


Users associated:

1- User-2 2- User-3

The files used to create this file:

No File name File address


1 SAH_Mortality_Demographic_ /CBD_HS/Cerner/SAH/ Mortality_Prediction/ 20200203.csv Cleaned_Data/ Demographics/


2 SAH_Mortality_Meds_ /CBD_HS/Cerner/SAH/ Mortality_Prediction/ 20200203.csv Raw Data/Labs/



Code file:

• Did you use codes:

® Code used C Code not used

  • • Coding environment Python
  • • OS environment Windows
  • • Non-standard packages/libraries used:


Package Name

Installation Command




pip install plotly




pip install python-docx



If the code is callable, put the command to generate the result: The brief description of the methods


1 Redundant Array of Independent Disks


1. Statists - The Statistics Portal. Statista. https://www.statista.com/statistics/551501/ worldwide-big-data-business-analytics-revenue/. Accessed August 8, 2019.

  • 2. Global big data and business analytics revenue 2015-2022 | Statista. https:// www.statista.com/statistics/590054/worldwide-business-analytics-software- vendor-market/. Accessed August 8, 2019.
  • 3. Shacklett, M. 4 ways to improve big data project management - TechRepublic. https://www.techrepublic.com/article/4-ways-to-improve-big-data-project- management/. Accessed August 8, 2019.
  • 4. Soares, M. S., W. S. Paiva, E. Z. Guertzenstein, et al. 2013. "Psychosurgery for schizophrenia: History and perspectives." Neuropsychiatric Disease and Treatment 2013 (9), 509-515. doi:10.2147/NDT.S35823.
  • 5. Hviid, A., J. V. Hansen, M. Frisch, and M. Melbye. 2019. "Measles, mumps, rubella vaccination and autism a nationwide cohort study " Annals of Internal Medicine 170(8), 513-520. doi:10.7326/M18-2101.
  • 6. What is project management? | АРМ. https://www.apm.org.uk/resources/ what-is-project-management/. Accessed April 28, 2020.
  • 7. Project management - Wikipedia. https://en.wikipedia.org/wiki/Project_ management. Accessed May 4, 2020.
  • 8. Munns, A. K., and B. F. Bjeirmi. 1996. "The role of project management in achieving project success." International journal of Project Management 14(2), 81-87. doi:10.1016/0263-7863(95)00057-7.
  • 9. Oussous, A., F. Z. Benjelloun, A. Ait Lahcen, and S. Belfkih. 2018. "Big data technologies: A survey." journal of King Saud University Computer and Information Sciences 30(4), 431-448. doi:10.1016/j.jksuci.2017.06.001.
  • 10. Chacon, S., and B. Straub. 2014. Pro Git. 2nd ed. Apress. https://git-scm.com/ book/en/v2.
  • 11. What is Git: become a pro at Git with this guide | Atlassian Git Tutorial. https://www.atlassian.com/git/tutorials/what-is-git. Accessed August 22, 2019.
  • 12. Potok, T., M. Vouk, and A. Rindos. 1999. "Productivity analysis of object-oriented software developed in a commercial environment." Software - Practice and Experience 29(10), 833-847. doi:10.1002/(SICI)1097-024X(199908)29:10<833::AID- SPE258>3.0.CO;2-P.
  • 13. Tomasello, M. The ultra-social animal. The European Journal of Social Psychology 2014. 44(3), 187-194. doi:10.1002/ejsp.2015.
< Prev   CONTENTS   Source   Next >