This is my PM566 Final Project website.
You can download my report from this link Download Report
Note: You can click the Table
button at the top to see the three summary tables on the Table page and click the Wilcoxon Test
button to check for the significant test. Sometimes it takes time to load the page and please use Open the link in a new tab to see these additional pages in case there might be some loading problems, thanks.
Every year, more than 795,000 people in the United States have a stroke[1]. And with the improvement of technology, a lot of data is stored as electronic health records, which allows us to analyze these clinical features. In this project, I will use the medical records from the article `A predictive analytics approach for stroke prediction using machine learning and neural networks Paper. The goal is to find whether there is an association between average glucose level and whether they have had a stroke, and whether there is any relationship between intermediate glucose level and age level or obesity level.
Stroke data was downloaded from EHR features dataset
In this data set, there are 5110 objects and 12 features as following
In this study, I want to explore the association between average glucose level value and stroke, also I am wondering if there is any relationship between age and BMI with average glucose level. It can be stepped down as the following questions:
sum(is.na(stroke))/nrow(stroke)
## [1] 0.03934234
There is only 3% missing values which means imputation is acceptable, mice
libraries was used to fo imputation.
imputed_data <- mice(stroke,m=3,maxit = 5,method = 'pmm',seed = 123)
##
## iter imp variable
## 1 1 bmi
## 1 2 bmi
## 1 3 bmi
## 2 1 bmi
## 2 2 bmi
## 2 3 bmi
## 3 1 bmi
## 3 2 bmi
## 3 3 bmi
## 4 1 bmi
## 4 2 bmi
## 4 3 bmi
## 5 1 bmi
## 5 2 bmi
## 5 3 bmi
missmap(completestroke)
As the image shows,there is no missing value after imputing data
Summary tables are saved at Tables page, there are three tables, one summarizing average glucose level based on stroke status, one summarizing average glucose level based on age level and the other summarizing average glucose level based on obesity level.
Wilcoxon test will be performed to compare if there is significant test among two groups, the results are stored at “Wilcoxon Test” page
This plot is an interactive density plot that displays the age distribution among stroke patients and non-stroke patients. In this figure, it can be seen that stroke begins when people are about 30 years old, and the density increases considerably steadily while when it comes to patients above 70 years old, the density increases dramatically and reaches a peak at approximately 79 years old.
This interactive density plot shows the density of average glucose level(mg/dl) based on stroke status, it can be seen that the average glucose level of most of the subjects who have not suffered a stroke was less than 150mg/dl while the density of people whose average glucose level above 150mg/dl is higher among stroke patients.
This is an interactive box plot with average glucose level(mg/dl) as y axis,age level as x axis and colored by stroke status.In this figure, elderly and adult stroke patient have the highest avg_glucose_level values, at 110.85 (mg/dl) and 97.2 (mg/dl) respectively.And the Q3 value reached to 200 when it comes to elderly stroke patients.
This interactive box plot displays the average glucose level distribution based on obesity level and stroke status. While there are no underweight and overweight stroke patients, obese stroke patients’ median average glucose level is 107.26(mg/dl) and others are all below 100 (mg/dl).
Copyright & copyright; 2022, Yutian (Margery) Liu