Data Analytics and Collaborations
Research Informatics processes data to be usable by combining data from multiple sources, transforming data from one state to another (e.g., by format conversion), cleaning data to eliminate noise, and using procedures to validate or quality-control data for exploratory data analysis to produce the insights that constitute the research findings. Research Informatics provides instruments for analysis and to investigate data sets and summarizes their main characteristics, often employing data visualization methods. Instruments and methods used for analysis should be documented; code written for purposes of data analysis and visualization may need to be preserved and made available in support of research results.
Databases with Analytical Capabilities
The Cosmos data set is unlike any other used in health research today. Cosmos combines billions of clinical data points in a way that forms a high quality, representative, and integrated data set that can be used to change the health and lives of people everywhere.
Digital Research Platform (DRP) Designed by the Research Informatics Group. We make research more agile, efficient, collaborative, and secure, while helping us to provide higher quality data to researchers. Now it’s easier to make scientific discoveries!
The DRP is a virtual research platform where we can centralize enterprise research data in an environment that facilitates bringing analytics to data and that cuts down on the redundancy of enterprise data warehouses. It also allows us to provide industry standard data science tools to all our researchers, enables scaling of resources when needed, and allows for consistent security and governance.The DRP is built on a hybrid infrastructure that relies heavily on Microsoft Azure and Databricks, but it includes other elements as well (and in the future will incorporate an electronic lab notebook).
There are many ways the DRP can benefit your research. The following are a few examples:
Big data: Most data in the DRP are stored using Spark, a distributed computing platform that can handle very large data efficiently. If you have big datasets that your computer might struggle to load, they can be brought into the DRP for analysis.
Machine learning and AI: the DRP uses Databricks, a data science platform that can enable you to use tools such as Pytorch, Tensorflow, and MLFlow, without struggling to get them installed properly. They just work.
Scalable compute: Using the DRP, we can provide the computational power you need for a study, without purchasing it in advance or needing to maintain it permanently. Compute is provided on demand. We can provide a letter of support for your grant describing the infrastructure.
Privacy: Sensitive data can be stored in the DRP and easily managed in a HIPAA-compliant way. More granular control is available in the DRP then using folders, because access can be controlled down to the level of columns in the data.
Consistency: Instead of sending datasets to an analyst, we can grant access to your dataset in an environment that allows analysis. Not only does this help to protect privacy, it means you can make sure everyone is working off the same version of the data.
Availability: Your data and analysis are available from wherever you can login, so you don’t have to worry if your computer goes down, or you need to unexpectedly work from another machine.
Access a network of sites, de-identified data and research resources. Conduct high-volume, efficient trial and outcomes research. Receive the opportunity to generate revenue through funded research studies.
Green HERON is a highly protected health data analytic space where approved users can work with de-identified health information. Green HERON simplifies the effort of obtaining EMR data from HERON, while supporting external researchers with approved NetIDs. The analytics space offers a rich set of tools, services, and resources required by research. Within the protected environment, Green HERON users are provided the ability to select analytic tools such as R, SAS, and Python.
SlicerDicer is a self-service reporting tool that clinicians, managers, and other roles can use to explore the data in Epic’s enterprise data warehouse. Out of the box, users can explore data in Epic-released data models. You can control which data models users can access and whether they can see line-level details or only summarized counts, SlicerDicer includes powerful data exploration abilities for clinical, access, and revenue subject areas. In SlicerDicer, users can investigate a hunch and then refine their searches on the fly.
Ingenuity Pathways Analysis enables scientists (e.g., biologists, geneticists, bioinformaticians) to identify the most relevant biological mechanisms, pathways and functions to their experimental datasets or genes of interest.
The KU Cancer Center built a data warehouse to Organize and Prioritize Trends to Inform KU Cancer Center (OPTIK), a curated data warehouse that retrieves and structures the data, with a common denominator, can support meaningful use of the data in a standard and consistent format which functions to streamline the process of synthesizing data regarding Kansas and Missouri demographics, cancer risk factors and incidence and mortality rates.
OPTIK standardizes these diverse data sources to enable analyses of the cancer burden at local, regional and national levels while upholding a strict standard of patient privacy. The OPTIK database enables researchers to use available data and create heat maps and other visualizations to aid in funding proposals, presentations and research activities.
Furthermore, using knowledge provided by OPTIK, the KU Cancer Center is able to prioritize action items for research and outreach and more effectively communicate the impact of those efforts.
The Accrual Prediction Program provides accrual information, including the predicted completion date, predicted number of accrued subjects during the pre-specified accrual period, and the probability of achieving accrual targets for all KU Cancer Center clinical trials.
Offered through the Department of Biostatistics & Data Science, the clinical trial sample size tool helps predict the time it will take to reach your study’s desired sample size.
The N3C is a partnership among the NCATS-supported Clinical Translational Science Awards (CTSA) Program hubs, the National Center for Data to Health (CD2H), and NIGMS-supported Institutional Development Award Networks for Clinical and Translational Research (IDeA-CTR), with overall stewardship by NCATS. As a partner, KU Medical Center is contributing COVID-19 clinical data to the N3C data enclave. KU Medical Center researchers can request access to the N3C data enclave to conduct studies for answering critical COVID-19 related research questions.
The 4CE is an international consortium for EHR data-driven studies of the COVID-19 pandemic. The goal of this effort is to inform doctors, epidemiologists, and the public about COVID-19 patients with data acquired through the health care process. 4CE is using a distributing learning framework where researchers post their queries through the coordinating center for participating sites to run the queries locally without raw EHR data leaving their institutions.