Title of the Seminar: Epigenetic Study of Socioeconomic Hardship from EHR-linked Biodata Bank

Date: 10/13/2023

Local Host: Indiana University School of Medicine

Seminar Series: Center for Computational Biology and Bioinformatics

Presenter: Dr. Yaomin Xu, Ph.D. MA

Reported by: Swaraj Thorat

The topic of this seminar report is Epigenetic Study of Socioeconomic Hardship from EHR-linked Biodata Bank, presented by Dr. Yaomin Xu. He holds the PhD in Statistics from Case Western Reserve University. He is focused on developing statistical and computational methods to extract new knowledge from large Biological Datasets. His recent research emphasis is on integrating patient phenome and genome data to identify complex mechanistic biomarkers for precision medicine. In this seminar he discussed about CSDH framework, Biobank, DNA Methylation, Data analysis and unpublished results.


This seminar covers how Social Determinants of Health (SDOH) can affect our genes, which can affect our health as it pertains to the conditions in which people born, live, work and age. And in between how life circumstances such as lack of income, employment, education, healthcare, crime, food access, community, environment etc. can affect our genes. Using Electronic Health Record data bank to examine how these factors can lead to changes in our gene expression which could potentially impact our health.

CSDH Conceptual Framework

CSDH was created by World Health Organization. Its primary purpose was to improve the conditions of daily life, tackle the inequitable distribution of power, money, and resources. Let’s understand the process, Epigenetics is a study of changes to gene expression, in this process it does not involve the change in DNA sequence, however changes can be caused by environmental factors such as diet, stress and above-mentioned factors. Dr. Yaomin Xu used large-scale genomic and electronic health record (EHR) datasets to study the relationship between SDOH and epigenetics. The study found the people with exposure to SDOH had different epigenetic profile than with less exposure.

Bio VU/SD, the biobanking program of Vanderbilt University Medical Center, has been used to analyze the data. It includes more than 3.6 million de-identified electronic medical records in SD. Each record spans 30 pages and contains 27 distinct codes.

DNA Methylation

It is a epigenetic process that plays essential role in gene expression and genome stability. It involves the addition of methyl group (CH3) to cytosine nucleotides within the DNA molecule. This alteration changes the structure of DNA without modifying the genetic code. It plays key role to determine which genes are turned on or off in different cell types. It also helps in maintaining the genome stability. It primarily occurs in cytosine nucleotide in the DNA sequence.

This process simply adds a methyl group to cytosine base affecting its structure and function. Blood methylation’s translational value is crucial to figure out adversity and deprivation in biobanks. Study shows modified DNA found in blood can assist to understand and address social and environmental factors affecting health and well-being. Studying the DNA changes, researchers can predict if individual has any health problems which includes things like smoking, drinking, diet, stress etc. It is helpful in understanding and predicting the diseases.

BioVU Study Design and Analysis

This is a framework prepared for conducting research using BioVU biobanking program. Dr. Xu focused on 570 cases with socioeconomic hardship cases based on International Classification of Disease code reporting. They also include385 control cases. They used a powerful tool for analyzing DNA Methylation patterns called Illumina MethylationEpic Array. In summary, Dr. Xu examined how individuals DNA has modified using a special tool that looks at nearly a million specific spots in the DNA to understand, how these modifications can affect the individual’s health and well-being. While conducting the analysis some challenges could occur such as Phenotype noise, array’s coverage, low signal to noise ratio etc.

Analysis Overview

Analysis Result

Preliminary results involves the detection of important signals from the Biobank DNA samples. Analysis conducted approximately 1000 DNA samples. Dr. Xu’s team were able to extract valuable information in terms of patterns and DNA Methylation data. Result shows that QQ plots with value less than 0.0001. This means observed patterns are unlikely to repeat by random chance. To enhance these findings their team has planned to use larger datasets from the EHR. Preliminary findings identify specific DNA sites that are connected to important function related to brain, stress, and mental health. They compared two targets, first one was African American and other was Caucasian and these sites were relevant to both.


Dr. Xu has achieved great success in his research, however he mentioned that his team need to do the further studies in this area. Additionally, he highlighted the power of blood methylation data in understanding the effects on health and well-being. It is helpful to understand and predict risk factors, diseases and identifies the actionable targets. The challenge of data noise and quality must be overcome. Larger datasets are key to get potential results.