Statistical Analysis Presentation

0:00

S… Speaker 1 (Statistical Analysis Presentation)

Hello, Dr. Thayer. I'm here to do a presentation on the numerical analysis module. The title of my presentation is about diabetes risk factors. I will look at the insights from the Pima Indian Diabetes Database. Before I do that, let me give three...

0:23

S… Speaker 1 (Statistical Analysis Presentation)

One is to say that diabetes is a major public health concern across the world. Type 2 diabetes is the one that is prevalent due to factors.

0:35

S… Speaker 1 (Statistical Analysis Presentation)

due to factors like lifestyle and demographic changes. And early identification of those people that are at high risk of developing diabetes is very important so that one can be able to put preventative and management strategies in place to control diabetes. The agenda of my presentation.

0:56

S… Speaker 1 (Statistical Analysis Presentation)

Today, we'll take the following format. I will look at the introduction, the purpose of my assignment, and look at the data exploration and descriptive statistics, including inferential statistics. Three and four actual answers questions that were raised on the assignment. I will do then conclusion and recommendation and the RAFRA produce, provide the references that were used to develop the presentation.

1:21

S… Speaker 1 (Statistical Analysis Presentation)

As an introduction, I want to state that De Fronzo and others in their paper of 2015 state that environmental factors that include obesity, unhealthy diet, lack of physical activities, genetic factors, they are the multiple contributors to the impairment of glucose, which results in type 2 diabetes.

1:46

S… Speaker 1 (Statistical Analysis Presentation)

across the world and then i also want to indicate that diabetes is also growing is a growing health concern across the world especially type 2 diabetes which results through from a lifestyle demographic changes

2:03

S… Speaker 1 (Statistical Analysis Presentation)

And then in 2021, it was observed that 537 million people across the world were living with this diabetes condition. And this number is expected to increase drastically over the coming decades. And then the PIMA Indian database is a good example for us or source to use to explore this condition and see because the population in that area actually have a high prevalence of...

2:32

S… Speaker 2 (Statistical Analysis Presentation)

of diabetes across the world.

2:34

S… Speaker 1 (Statistical Analysis Presentation)

Looking at this map here on the side, it's a map that estimates an increase on diabetes from 2019 to 2045. And then if you look at the world picture, it shows that between 2019 and 2025, there will be an increase of 51%. It actually shows that different regions across or continents across the world. And then looking at Africa itself, it shows that between 2019 and 2025, there will be an increase of 145, which is drastically high.

3:03

S… Speaker 1 (Statistical Analysis Presentation)

high and it really needs serious intervention to control it the purpose of the assignment is actually to look at the the pima the indian database diabetes database and this will help us to explore the risk factors highlight predictive modeling that we can use to analyze the data

3:24

S… Speaker 1 (Statistical Analysis Presentation)

discuss implications for early identification and prevention of database. And I need to indicate that R was used to develop, to analyze the data set and all the results are based on the output from R. The data set itself, it's using nine variables from which includes pregnancies, glucose, blood pressure, thickness of skin.

3:47

S… Speaker 1 (Statistical Analysis Presentation)

Insulin, BMI, diabetes pedigree function. And then age, and the last one is the outcome, which actually indicates whether one has diabetes or not. So this is the data set that will be used to provide answer, to be analyzed and provide answers to the questions.

4:06

S… Speaker 1 (Statistical Analysis Presentation)

They're looking at the data exploration and descriptive statistics. The first question that was raised there was to indicate how many individuals are in the sample. Analysis provided an answer is 768 individuals. And of those as to answer question two, 34.9% are those that were diagnosed with diabetes. And then an histogram was developed to check the distribution of age.

4:33

S… Speaker 1 (Statistical Analysis Presentation)

And then it shows that the age is actually skewed, scaled towards the older age, meaning that the data set is skewed towards the younger individuals. That means the majority of the population in the sample data set are actually the young people, although the age range is between 21 and 81. The fourth question was about indicating the women in terms of pregnancy.

5:00

S… Speaker 1 (Statistical Analysis Presentation)

Answering question four, it was observed that those that have never been pregnant before, it's 111. Those that have been pregnant before, it's 657 from the data set. And then the fourth, fifth question asked to calculate the mean, the median, the mode, the minimum, maximum range, and standard deviation across these variables. And looking at these statistics, one would realize that

5:29

S… Speaker 1 (Statistical Analysis Presentation)

the average age uh is about 33 years though the age range is between 21 and 81 but the average age is about 33 years the bmi has a mean of 32 indicating that the tendency uh toward the tendency of of of the population is is towards overweight and obesity glucose has an average of 81.

5:53

S… Speaker 1 (Statistical Analysis Presentation)

the average blood pressure is about 69 and the number of pregnancies has an average of four pregnancies with a maximum of 17 which is really something that one needs to investigate and in the data set question six on the on the data exploration looks at

6:14

S… Speaker 1 (Statistical Analysis Presentation)

developing histograms for age bmi glucose blood pressure and number of pregnancies and then what does this uh plots indicate to us the plots indicated to us that most individuals in the data set are relatively young with ages concentrated around the 20s and then bmi values at last start around 30 and 40 indicating that overweight and obesity is a common problem in the data in the data set glucose levels peaks

6:43

S… Speaker 1 (Statistical Analysis Presentation)

from 100, suggesting that individual glucose is close to threshold between normal and at risk. Blood pressure values are centered around 70 and 80, which aligns with the typical diagnostic reading, but highlights variation across the sample. A number of pregnancies is skewed towards the lower counts, with most participants indicating that there was zero.

7:11

S… Speaker 2 (Statistical Analysis Presentation)

or three pregnancies.

7:15

S… Speaker 1 (Statistical Analysis Presentation)

The next question actually I requested to ask to calculate outliers and unusual values in the data set. The variables are identified across this column variable and then these are possible outliers and these are the discussions based on those outliers. Pregnancies, there are values that ranges from 14 to 17 as pregnancies and this is physiologically unlikely or impossible because almost certainly

7:43

S… Speaker 1 (Statistical Analysis Presentation)

These can be described as one of those values that are supposed to be carrying missing data in the dataset instead of being real pregnancies. Glucose, values of zero and glucose, this is also not physiologically correct, and one cannot have zero glucose. Probably this also stands for missing values in the dataset. Blood pressure of zero values.

8:07

S… Speaker 1 (Statistical Analysis Presentation)

it's also questionable because one cannot have a zero value of blood pressure probabilities are also classified as missing values skin thickness extreme values of 99 unusually highly compared highly high compared to the rest of the distribution and this can be presented or recognized or reflected or flagged as missing values again on the data set insulin the number of extreme values on the insulin which is greater than or equals to 300

8:36

S… Speaker 1 (Statistical Analysis Presentation)

increases up until 846. These are far above typical physiological ranges, but can represent true extreme values, often treated as outliers in the dataset for consistency. BMI, we look at records that are zero.

8:55

S… Speaker 1 (Statistical Analysis Presentation)

values and few records that have values higher than 559 and up to 67 in this variable. And this, again, the values of zero are invalid in terms of BMI. And those that are 59 to 57 can be flagged as outliers. Diabetes pedigree, the outliers identified are values of 2.0 reaching to a maximum of 2.42. And this, again, can be flagged as outliers in the data set. And lastly, age.

9:24

S… Speaker 1 (Statistical Analysis Presentation)

The outliers in the age that are identified are those between 67 and 81, but one needs to indicate that these are not actually true outliers because you can have age group of individuals with age 67 to 81 in the data set, but they are flagged as outliers looking at the majority of the population sitting at a younger age.

9:47

S… Speaker 1 (Statistical Analysis Presentation)

Data exploration continues at diabetes rate. We are requested to calculate diabetes rate across these variables. And then these are the rates calculated and the counts per each group.

10:00

S… Speaker 1 (Statistical Analysis Presentation)

analysis from the above table, one can realize that the individuals in the data set are younger than 30 and they have the lowest diabetes rate at 21.2, despite being the largest group at 396. The rate rises sharply to 30 to 46.1 for the age group 30 to 39.

10:27

S… Speaker 1 (Statistical Analysis Presentation)

peaks again to 51.5 4.1 for the age group 50 to 49 and this is interesting because if you look at age 50 plus age group the the rate drops to 48.83

10:42

S… Speaker 2 (Statistical Analysis Presentation)

which is lower than the 40-49 age group. And in summary, one can say that the data highlights that diabetes risk increases significantly with age, particularly from the age status onwards, even though that the number of participants declines in the older age group.

11:02

S… Speaker 2 (Statistical Analysis Presentation)

diabetes rate across different groups of BMI. These are the groups that were created from underweight, normal, overweight, and obese. These are the total counts, those with diabetes and the diabetes rate. Looking at the results of BMI categories, it indicates that the individuals that normally weigh

11:29

S… Speaker 2 (Statistical Analysis Presentation)

Normal weight, individuals with normal weight had the lowest diabetes rate at 6.9. Those that are underweight has a slightly higher rate at 13.9, but the rate increases for those that are overweight to 23.9. And it's getting worse for those that are obese at 46.1, 4%. And then it actually shows that overall that the...

11:56

S… Speaker 2 (Statistical Analysis Presentation)

that the total prevalence rate of of diabetes in this group is 34.9 looking at the 268 versus the total of 768 and this highlights a stronger positive association between higher bmi and diabetes that means the higher you the bmi they had a risk of of of of diabetes and this means that diabetes

12:21

S… Speaker 1 (Statistical Analysis Presentation)

Diabetics, if it needs to be controlled, obesity and overweight must be taken into consideration. Exploration of data statistics. Question 10 was also looking at diabetes rate across pregnancies. These are the categories for pregnancy counts, diabetes cases.

12:45

S… Speaker 1 (Statistical Analysis Presentation)

and then diabetes rate looking at the results across the pregnancy groups it shows that with the women with no pregnancies at the diabetes rate of 34.2 a percent and those that had one to two pregnancies the rate dropped to 20.2 percent for those that had four to four four to five pregnancies it increased to 35.0 but those that had five and above the pregnant the rate increased to 47.

13:16

S… Speaker 1 (Statistical Analysis Presentation)

So it simply means that this reflects a strong association between higher pregnancies count and elevated diabetic rates. And this highlights that having one to two pregnancies appears relatively protective, but as you go higher with the number of pregnancies, the rate of contracting or having diabetes also increases.

13:43

S… Speaker 2 (Statistical Analysis Presentation)

Looking at the second question in terms of inferential statistics, the first one wanted us to test if there is a difference between glucose levels for those with and without diabetes. The null hypothesis says the mean glucose of those with is the same as those without, and the alternative says the mean of the two groups is not the same.

14:08

S… Speaker 2 (Statistical Analysis Presentation)

And we look at the test statistics at 95% confidence interval. And then the test method uses Welch to sample T-test. The test variables include glucose and outcome. These are the test results. We look at T-test at that value, the decrease of freedom, and then the P-value, which is far less than 0%. And then the estimates of the sample mean of the glucose. Those without is sitting at 110. Those with decline.

14:38

S… Speaker 1 (Statistical Analysis Presentation)

with diabetes, the mean is actually of the glucose is actually sitting at 141. Concluding on the results, the T-test statistics of minus 13.72 indicates a very strong difference between the groups since the test produces a p-value which is far less than zero.

15:00

S… Speaker 1 (Statistical Analysis Presentation)

the null hypothesis is rejected, meaning that the evidence strongly supports that glucose levels for those with and those without diabetes is different, it's not the same. Testing, the second question was testing the pregnancy between diabetic and non-diabetic individuals, if there is a difference. Again, the null hypothesis says

15:25

S… Speaker 1 (Statistical Analysis Presentation)

The pregnancy average or mean of those with and without is the same. The null hypothesis says the mean or the average of the two groups is not the same. We're doing this at 95% confidence interval.

15:43

S… Speaker 1 (Statistical Analysis Presentation)

At the inferential statistics, we look at the method used is also worked to sample t-test. The test statistics at that value, the decrease of freedom at that value, and the p-value far less than zero. The pregnancy mean estimates for those without is 3.3. Those with diabetes, the pregnancy mean is 4.9.

16:10

S… Speaker 1 (Statistical Analysis Presentation)

Concluding on the results, the large test statistics of minus 4.907 indicates a strong difference between the two groups. The pVL is far less than 0, essentially 0, far less than 0.5, meaning that the difference is highly statistically significant, suggesting that the pregnancy history may include.

16:32

S… Speaker 1 (Statistical Analysis Presentation)

is an important factor for association with the diabetes risk, meaning that as the pregnancy increases, the risk of diabetes increases proportionally. Inferential statistics continues. We're looking at examining the correlation metrics between the variables pregnancy, glucose, blood pressure, and skin thickness up to age. We're looking when we did this metrics.

16:58

S… Speaker 1 (Statistical Analysis Presentation)

different categories that were identified. The first one was strong positive correlation. In this, we discovered that glucose and insulin show a strong positive relationship. Higher glucose levels tend to align with higher insulin values. BMI skin thickness also colorate positively, reflecting how body fat and skin fold thickness are related. Moderate positive coloration variables is age and pregnancies.

17:24

S… Speaker 1 (Statistical Analysis Presentation)

meaning that all the participants tend to have more pregnancies.

17:31

S… Speaker 1 (Statistical Analysis Presentation)

And then continue with the moderate positive correlation, BMI and diabetes pedigree function. Family history of diabetes is moderate associated with higher BMI. Those with weak or near zero coloration, blood pressure has relatively weak correlation with most other variables, suggesting it varies on its own independently. Skin thickness and age show that little relationship indicates.

18:01

S… Speaker 1 (Statistical Analysis Presentation)

that H does not affect skin fold thickness in the dataset. Continuing with the last one, which is with those variables that have negative or very weak negative correlation, based on the analysis, one discovered that no strong negative correlation appear in the dataset, and most of the relationship are positive or close to zero based on this correlation matrix.

18:29

S… Speaker 1 (Statistical Analysis Presentation)

The next question on the inferential statistics wanted us to test the association between diabetes outcome categories for age and BMI. We'll start with variables, age groups and outcome. These are the age groups that were calculated. Prevalence rate for prevalence count for those without diabetes, those that with diabetes and their counts are off. The Chi-square is used to test this. The Chi-square veil is coming at 69.91.

19:00

S… Speaker 1 (Statistical Analysis Presentation)

915, the degrees of freedom is 3, the p-value is far less than 0.5. Concluding on the results on this one for age group and outcome, the p-value is far below 0.5, meaning that the association between age group and diabetes outcome is statistically significant. What does that mean? This means that diabetes prevalence increases with age where the proportion of diabetes cases rises steadily with age.

19:28

S… Speaker 1 (Statistical Analysis Presentation)

where age group greater than 30 is sitting at 21% cases, age group 30 to 39 at 46% cases, age group 40 to 59 is 55% cases, and age group 50 plus is sitting at 45%, 48%. And then when you look at HI squared, HI squared confirms that this pattern is unlikely to be caused by incident in the dataset.

19:54

S… Speaker 2 (Statistical Analysis Presentation)

looking at the variable BMI categories and outcome.

20:00

S… Speaker 1 (Statistical Analysis Presentation)

check if there is an association. The method again is chi-square. The chi-square is coming out at 78.25, 78.249 degrees of freedom at 3, the p-value of up lower than 0.5. Concluding on the results for the categories of BMI, PSS outcome.

20:23

S… Speaker 1 (Statistical Analysis Presentation)

The value is far below 0.5, confirming that highly significant association between BMI and diabetes outcome to exist, meaning that the diabetes prevalence increases sharply as the...

20:38

S… Speaker 1 (Statistical Analysis Presentation)

the BMI increases. For example, if you look at underweight, they are at a case, they are 13%, they have 13% cases, normal weight at 7%, cases overweight 22%, and obese is sitting at 46%, which shows that as the BMI increases, the risk also increases of diabetes. The HI-SQF 78.249 shows that the link between BMI and diabetes is highly significant.

21:08

S… Speaker 1 (Statistical Analysis Presentation)

This means that weight status is a major driver of diabetes. And when we want to sharply control diabetes, we need to look at those people that are overweight and those that are obese to actually try and control diabetes.

21:25

S… Speaker 2 (Statistical Analysis Presentation)

Question five on this wanted us to compare the mean glucose levels across different age groups. And this is the graph that indicates the mean across the groups, age group less than 30 to 39, 40 to 49 and 50 plus. And looking at the results, the mean glucose level rise steadily with age with the highest average observed in 50 plus group at 39.6.

21:53

S… Speaker 2 (Statistical Analysis Presentation)

average of glucose and the increase is almost noticeable at age 30 suggesting that the older

22:02

S… Speaker 2 (Statistical Analysis Presentation)

adults carry a higher risk of elevated glucose levels since the average is at 39.6. And then the pattern from the graph on the right actually reinforces that age is a critical factor in glucose regulation. So preventing and monitoring the intervention should be prioritized from early age group.

22:25

S… Speaker 2 (Statistical Analysis Presentation)

30 plus to actually make sure that you control and you put control play a mechanism in place to control diabetes question six on the inferential statistics one that has to develop a multi

22:37

S… Speaker 1 (Statistical Analysis Presentation)

linear regression model to actually evaluate the predictors across these variables. And the model to predict glucose is identified as this model. And then testing the variables that are more significant in predicting glucose. BMI with many dummy variables was identified as the one that is statistically significant to predict glucose since the p-value is less than

23:06

S… Speaker 1 (Statistical Analysis Presentation)

0.5. And then variables that are not more significant in predicting glucose, its age coefficient at 0.39 was not significant, and the p-value was greater than 0.25, meaning that did not dependently predict glucose once BMI categories were included. Other variables, pregnancies.

23:27

S… Speaker 1 (Statistical Analysis Presentation)

pregnancies, blood pressures, thickness, insulin, and diabetes, these were dropped due to singularities. And once the BMI variables were introduced, these automatically dropped themselves. And then the 6.4 question asked us how much variance did glucose levels can...

23:48

S… Speaker 1 (Statistical Analysis Presentation)

can explain in the model. Looking at the R squared, R squared is sitting at 0.9513. This means that the model explains about 95% of variance in the glucose. This model explains about 95% of glucose levels in the data set. But looking at the adjusted R squared, which is sitting at 0.4429, this shows that

24:10

S… Speaker 2 (Statistical Analysis Presentation)

This accounts for the very large number of predictors, those dummy variables for PMI. And then statistically, it explains that the model only explains about 44% of variance in the dataset. Question seven, in the inferential statistics, we were asked to create a logistic model to predict diabetes outcome zero or one using

24:37

S… Speaker 2 (Statistical Analysis Presentation)

bmih glucose as predictors this is the model that will be used and then where p is a probability of diabetes outcome we calculated the coefficients of the variables we look at we also calculated the odd ratios for the three variables including the intercept and then when we are when we

25:00

S… Speaker 2 (Statistical Analysis Presentation)

when we

25:01

S… Speaker 1 (Statistical Analysis Presentation)

Look at the outcome of the test. We look at the chi-square, which is coming out at 8.9222 with degrees of freedom 8, and then with PVL at 0.49. And then since the PVL is greater than is less than is greater than 0.52, the Hosmer-Lowesh test suggests that the logistic regression model fits the data reasonably well. Performance in the matrices that we're supposed to calculate accuracy is coming out at 0.

25:32

S… Speaker 1 (Statistical Analysis Presentation)

at 95% confidence interval. This model correctly classifies 77% of the cases in the dataset. Susceptivity comes at 0.5634. The model detects about 56% of true cases in the AS database case, as direct diabetes cases in the model, in the dataset. Specificity comes at 0.5634.

25:54

S… Speaker 1 (Statistical Analysis Presentation)

at 0.878 the model correctly identifies about 88 percent of two negative non-diabetic cases in the data set the balance accuracy which comes out at 0.720

26:08

S… Speaker 1 (Statistical Analysis Presentation)

7207, this is used to actually balance the average sensitivity, to average the sensitivity and the specificity where the classes are not balanced. So when you balance the two using the balance accuracy, it shows you that actually the two are actually balanced at 72% in the dataset.

26:33

S… Speaker 1 (Statistical Analysis Presentation)

investigating whether there are significant interactions between the BMI and H predicting the diabetes risk. We firstly look at the likelihood ratio to test the model. The analysis, we do the analysis deviance table where model one is given by this formula, model two is given with that formula with the interaction between BMI and H. And then we calculated the residuals and the deviances.

27:01

S… Speaker 1 (Statistical Analysis Presentation)

for the two and then we also look at the interaction model coefficients for all the variables

27:11

S… Speaker 1 (Statistical Analysis Presentation)

including the interaction. We also look at the likelihood ratio to compare the models. We look at model one, what has no interaction with that formula, model two with interaction with that formula. And we look at the test statistics that deviance is sitting at 0.636 with decreased freedom one and the P value higher than 0.05.

27:36

S… Speaker 1 (Statistical Analysis Presentation)

And since the P value is greater than 0.05, the interaction of the term BMI and H does not significantly improve the model fit. Therefore, the model without the interaction is preferred over the one with interaction. The second part of question 8 was to ask us to do odds ratios for the interaction model. We calculated the odds ratios. We also look at the confidence of the odds ratios.

28:04

S… Speaker 1 (Statistical Analysis Presentation)

2.5 to 95.7 percent and then the odd ratios comes in this way where you look at bmi with that ratio and that significant level which means that the unit increase in bmi is associated with approximately five percent higher than the odds of of of of diabetes but the confidence interval crosses across one which means the effect is not statistically significant

28:32

S… Speaker 1 (Statistical Analysis Presentation)

age with that odd ratio and the significance level this means that age is almost no has no effect on diabetes odds in the model and the confidence interval is wider and it was one so the odds also are not significant the glucose is at that out ratio with that significant interval this means that the unit increases increases glucose rises increase as the as the

29:02

S… Speaker 1 (Statistical Analysis Presentation)

The glucose increases. This means that 3% of the interval crosses one. That means the highly significant predictor. That means glucose is the highly significant predictor of diabetes. The BMI interaction with age.

29:23

S… Speaker 1 (Statistical Analysis Presentation)

with that odd ratio and that significance level this means that the odd ratio is extremely close to one and the the oval and it also the significance overlaps with one meaning that the interaction effect is negligent and is not significant comparing the two modules the last question was to compare the two logistic regression models we look at model one with the effect and model one without today

29:49

S… Speaker 1 (Statistical Analysis Presentation)

model one without the interaction and model two with the interaction we calculated the deviation table first and then we do it the summarization

30:00

S… Speaker 1 (Statistical Analysis Presentation)

of the interaction model coefficients for all the variables. We did the eight ratios only specifically for model two, the one with interaction. And then we also did the confidence interval for the odd ratios at 2.5 to 95.5%.

30:19

S… Speaker 1 (Statistical Analysis Presentation)

And then concluding on the results on this one where the models are compared, model one comes at the residual deviance of 742.10 and then model two comes at the residual deviance of 742.60. The likelihood test ratio has a deviance of 0.039 with a P value greater than 0.2.

30:44

S… Speaker 1 (Statistical Analysis Presentation)

and the p-value is much greater than 0.5, meaning that the interaction between H and BMI does not sufficiently improve the model fit, thereby adding the interaction does not really help us to increase the performance of the model. So it doesn't help us at all. Coefficients of module 2.

31:07

S… Speaker 3 (Statistical Analysis Presentation)

this they come that at light level bmi with the at that co deviance coefficients with that interaction with that p value

31:16

S… Speaker 2 (Statistical Analysis Presentation)

And the odd ratio, this suggests that higher BMI increases diabetes odds by about 8% per unit, but the midpoint is a midpoint significance. Age, that deviance ratio, that odd ratio, the p-value greater than 0.5. This means that age alone is not significant on the odds.

31:39

S… Speaker 2 (Statistical Analysis Presentation)

Glucose with that deviance ratio with the ratio at that and p-value less than 0.5. This means that glucose is a strong, highly significant predictor. And each unit in glucose raises an increase of about 3.4% in the risk of diabetes. Pregnancies with that deviance ratio with that odd ratio and then p-value less than 0.5. Again, this means that age is significant.

32:07

S… Speaker 2 (Statistical Analysis Presentation)

And each additional pregnancy increases the odds by about 12%. And lastly, the BMI, the interaction between BMI and H with that deviance odd ratio and odd ratio at that and PVA greater than 0.5. This means that the interaction is not significant and its effect is also negligible in predicting diabetes.

32:37

S… Speaker 3 (Statistical Analysis Presentation)

Concluding on the results, we also look at out ratios and confidence intervals. They come at that level. This means that for PMI, they come at that level. This means that PMI is a bottom line that includes one, which means it's not so significant. And then H, a dot out ratio and confidence interval. This means that it is not significant as well. Glucose, a dot out ratio, and that's significant.

33:05

S… Speaker 1 (Statistical Analysis Presentation)

strong variable with precise effect. That means glucose is the strong variable to predict.

33:14

S… Speaker 2 (Statistical Analysis Presentation)

diabetes in the sample data set. Pregnancies with that odd ratio and that significance level, it means that it is a significant but meaningful variable in predicting diabetes. And then lastly, the interaction between BMI and age and that odd ratio and that confidence interval, this means that interaction has essentially no effect when the confidence interval is slightly close to one, so their effect is not.

33:42

S… Speaker 2 (Statistical Analysis Presentation)

recognizable or is very significant. In conclusion, I want to state that Klein and others in their paper of 2022 argue that the accumulation of an extensive amount of body fat can cause type 2 diabetes and the risk of type 2 diabetes increases.

34:04

S… Speaker 2 (Statistical Analysis Presentation)

linearly or directly proportional to the increase of the body mass. And this is true because when I did the analysis, it actually indicated that indeed the increase in body mass index really makes an increase in the risk of diabetes.

34:21

S… Speaker 3 (Statistical Analysis Presentation)

Key findings that are found for diabetes risk from the analysis is prevalence. Total, summing up the results, the total of about 34.9 of participants are diagnosed with diabetes from the sample data set. Age, actually the increase in age rising increases the risk of diabetes after age 30 peaking from...

34:50

S… Speaker 3 (Statistical Analysis Presentation)

to 55 at age 40 to 49. And then the BMI is a strong association at 46% prevalence.

35:00

S… Speaker 1 (Statistical Analysis Presentation)

with obese group sitting at 46, while the group of normal weight is sitting at 7%. This also strongly indicates that the higher the body mass index, the higher the risk of diabetes. Pregnancies, the higher the period of pregnancies, that means five and above, it links to nearly about 48% of diabetic.

35:24

S… Speaker 1 (Statistical Analysis Presentation)

So age is also one of those that can be controlled to manage diabetes. I mean, pregnancies can be managed to control diabetes. And then glucose is a strong predictor and each unit increases a raise of odds numbers by 3% in the diabetic risk.

35:47

S… Speaker 1 (Statistical Analysis Presentation)

And then model performance, logistic regression that was fitted on the data using the HOSMA level had a p-value of greater than 0.5. The accuracy was sitting at 77%, specifics at 88%, sensitivity at 56%.

36:04

S… Speaker 1 (Statistical Analysis Presentation)

The interaction of age and PMI was observed that it has no effect, and the effect is actually negligible. And then instead of using the model with the interaction, one is preferred to use the model without the interaction. Lastly, glucose, PMI, and pregnancy history are the most critical predictors of diabetes risk in the data set, while age amplifies risk, but it adds little.

36:31

S… Speaker 2 (Statistical Analysis Presentation)

impact beyond these risk factors that are identified here.

36:37

S… Speaker 1 (Statistical Analysis Presentation)

In recommendations, there are following recommendations that I'm putting forward based on the study that I did analyzing the PIMA dataset. One is to prioritize glucose monitoring. Since glucose is the strongest predictor, routine screening and early detection programs should focus on elevating glucose levels, especially in adults over age 30.

37:05

S… Speaker 1 (Statistical Analysis Presentation)

from age dating above. Target obesity prevention with nearly half of the obesity individuals in the data set are diabetic. Intervention should be emphasized in weight management through diet, physical exercises, and community health programs to keep the people very active so that they can keep their body mass index at check.

37:31

S… Speaker 1 (Statistical Analysis Presentation)

in required level. Support maternally health, the higher number of pregnancies are linked to the high risk of diabetes, so targeted education and monitoring for women with multiple pregnancies can be introduced to actually try and

37:50

S… Speaker 1 (Statistical Analysis Presentation)

to try and control long-term risk of diabetes in different communities across the world. Age-focused intervention, diabetes prevalence rises sharply as age 30 and above, so preventive strategies should be started as early as age 30 and above to try and control diabetes.

38:09

S… Speaker 1 (Statistical Analysis Presentation)

uh the quality of data quality improvements on the data set uh there were there were values that were identified like glucose zero value bmi zero value these can be looked at because they don't make sense so when we when we

38:24

S… Speaker 1 (Statistical Analysis Presentation)

Do analysis. This can be removed before we do the analysis so that we can strengthen reliability and predictive models to ensure that accurate clinical data insights are produced. Lastly, model application, the use of regression models for risk stratification in clinical or community settings.

38:43

S… Speaker 1 (Statistical Analysis Presentation)

Balancing sensitivity and specificity to identify high risk is very important so these models can be continually modified and worked on to get better insights.

38:56

S… Speaker 1 (Statistical Analysis Presentation)

References, these are the references that were used to produce the presentation. They cut across readings about diabetes and the other ones cut across using R for data analysis. So these references were very important and helpful in helping me in developing this presentation.

39:18

S… Speaker 1 (Statistical Analysis Presentation)

Lastly, I want to say thank you so much, Dr. Tahir, for all your support and guidance throughout the module. Really appreciate your support and your help. Thank you.

Statistical Analysis Presentation

Yhteenveto

Kysy tekoälyltä tästä transcriptistä