Kareem Saheed | Freelancer Analysis Using Spss And Excel

ANALYSIS USING SPSS AND EXCEL

CHAPTER ONE INTRODUCTION 1.0. PREAMBLE This chapter comprises of the Background of the Study, Statement of the Problem, Aim and Objectives, Scope of the Study, Organization of the Study and Definition of Terms. 1.1. BACKGROUND OF THE STUDY Corona viruses are a large family of viruses that causes respiratory illness ranging from the common cold to more severe disease such as middle east respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS). Coronal viruses are usually transmitted between man and animal. The common symptoms include cough, fever, and shortness of breath and tiredness while the less common symptoms are aches and pain, sore throat, diarrhea, conjunctivitis, headache, loss of taste or smell, a rash on skin, or discolouration of fingers or toes which may appear as few as two days after infection or as long as 14 days after infection. Why covid 19 (SAR-COV-2) is new strain of corona virus that has not been previously identified in human. It was first identified in Wuhan china. The world health organization (WHO) all some other national health agencies confirm that corona virus usually spread from infected persons to others through: 1. Through the air by during the process of coughing by the infected person and sneezing. 2. Close personal contact such as touching or when shaking hands. 3. Touching of surfaces with the virus on it, then touching one’s eyes, mouth, nose or eye before washing the hand. National health laboratory, the Nigeria center for disease control (NCDC), the World health organization (WHO) (2020), recommended that The outbreak in Nigeria, according to the Nigeria center for disease control (NCDC), Nigeria recorded its first case of corona virus through an Italian in February 27th, 2020 which sent wave of panic across the country due to the unpreparedness of her health sector to handle the deadly situation. Upon the detection of the index case, the NCDC instituted a multi-sectorial National Emergency Operations center (EOC) to oversea the national response to covid-19, subsequently, the presidential Task Force (PTF) inaugurated on March 9, 2020, whom announces the guidelines, the risk and the travel ban on 13 COVID-19 high-risk countries and their restrictions from entering into the country. Also, the Port Health Service and he NCDC itself monitored, the isolation of returning traveler from the listed affected nations onward. Their isolation for each returnee lasted for 14days which most of them did not comply with which led to the large outbreak in the Country. Jimoh Amzat et.al, (2020). Furthermore, the NCDC disclosed in the first 30days that most of the infected were returnee and 70% of the individual tested positive were males while 30% were females. Where Lagos state had the highest cases recorded. Thus, in this study, we attempt to address the relationship between factors (such WHO Regio, Cases - cumulative total, Cases - cumulative total per 100000 population, Cases – newly reported in last 7 days, Cases - newly reported in last 7 days per 100000 population, Cases - newly reported in last 24 hours, Deaths - cumulative total per 100000 population, Deaths - newly reported in last 7 days, Deaths - newly reported in last 7 days per 100000 population, Deaths - newly reported in last 24 hours) that contribute to the Deaths - cumulative total of COVID-19 patients in Africa. 1.2. STATEMENT OF THE PROBLEM This study observed that the whole world was curious about the reduced number of deaths of corona virus patients in Africa, which they do not put their focus on some factors that might influence the increase in the death of the COVID- 19 patients. Also, these factors might also dictate the extent of death COVID-19 patients. This pertinent reason prompted this study using the available data. 1.3. AIM AND OBJECTIVES The aim of this study is to investigate the current clinical variables that contribute to the Death of COVID-19 patients. The specific objectives are to: i. Observes the violation of assumptions of multiple linear regression on the COVID – 19 data. ii. Estimate a model on the basis of the statistical variable(s) that contribute to the cumulative death of COVID-19 patients. iii. Evaluate the performance of the model using some criteria. iv. Identify the variables that contributes significantly to cumulative death. 1.4. SIGNIFICANCE OF THE STUDY Knowing the current statistical variables that causes rapid increase in the death of COVID-19 patients, Thus, other countries/continents preventing themselves from the things that causes increase in those statistical variables will significantly enhance their strategic plans to battle the virus humanly. 1.5. SCOPE OF THE STUDY The scope of this study is limited to the study of relationship of some current statistical variables that contribute significantly to the number of deaths in COVID-19 patients. 1.6.PROBLEM ENCOUNTERED With all effort made to collect data on corona virus in some states in Nigeria, I was made to know that the data available are spurious data, or are not even available. After approval from the offices of commissioner for health in the selected states, it’s unfortunate that the requested data was not released from the appropriate department. 1.7. SOURCE OF DATA The data used for this project work is secondary data, extracted from World Health Organization (WHO) COVID-19 dashboard. (https://covid19.who.int/table) released May 25th, 2021. 1.8. ORGANIZATION OF THE STUDY The project is organized as follows: Chapter One presents the Background of the Study which includes the Introduction of the Subject of Study, Statement of the Problem, the Aim and Objectives, Significance of the Study, Scope of the Study and Definition of Terms. Chapter Two covers the Literature Review, detailed explanation of Prostate Cancer and the Methodology of the Research Work is presented while Chapter Three comprises of Data Presentation, Data Analysis and Interpretation of Results. Chapter Four presents Summary, Conclusion, and Recommendation of the Study. CHAPTER TWO LITERATURE REVIEW AND METHODOLOGY 2.0. INTRODUCTION This chapter comprises of two sections, the first section entails the Introduction and History of COVID-19 in the world, in Africa and in Nigeria, Sign and Symptoms of COVID-19, Diagnosis of COVID-19, Prevention, Screening and Management of COVID-19 patients Empirical Literature Review. The second section describes the methodology used to meet the Aim and Objectives of the Study. 2.1. INTRODUCTION OF CORONA VIRUS The recent outbreak began in Wuhan, a city in the Hubei province of China. Reports of the first COVID-19 cases started in December 2019 before its later to become every country baby. Coronaviruses are common in certain species of animals, such as cattle and camels. Although transmission of coronaviruses from animals to humans is rare Trusted Source, this new strain likely came from bats, though one study suggests pangolins may be the origin. However, it remains unclear exactly how the virus first spread to humans. Some reports trace the earliest cases back to a seafood and animal market in Wuhan. It may have been from here that SARS-CoV-2 started to spread to humans. The CDC Trusted Source recommend that people wear cloth face masks in public places where it is difficult to maintain physical distancing. This will help slow the spread of the virus from people who do not know that they have contracted it, including those who are asymptomatic. People should wear cloth face masks while continuing to practice physical distancing. Instructions for making masks at home are available here Trusted Source. Note: It is critical that surgical masks and N95 respirators are reserved for healthcare workers. How it spreads 1. SARS-CoV-2 spreads from person to person through close communities. When people with COVID-19 breathe out or cough, they expel tiny droplets that contain the virus. These droplets can enter the mouth or nose of someone without the virus, causing an infection to occur. 2. The most common way that this illness spreads is through close contact with someone who has the infection. Close contact is within around 6 feet Trusted Source. 3. The disease is most contagious when a person’s symptoms are at their peak. However, it is possible for someone without symptoms to spread the virus. A new study suggests that 10% of infections are from people exhibiting no symptoms. 4. Droplets containing the virus can also land on nearby surfaces or objects. Other people can pick up the virus by touching these surfaces or objects. Infection is likely if the person then touches their nose, eyes, or mouth. Aaron kandola (2020). 5. It is important to note that COVID-19 is new, and research is still ongoing. There may also be other ways that the new coronavirus can spread. Is it more dangerous than other viruses? Most cases of COVID-19 are not serious. However, it can cause symptoms that become severe, leading to death in some cases. The outbreak of COVID-19 has been sudden. This makes it difficult to estimate how often the disease becomes severe or the exact rate of mortality. Aaron kandola (2020). One report suggests that out of 1,099 people with confirmed cases in China, around 16% became severe. Another report estimates that about 3.6%Trusted Source of the confirmed cases in China led to death. These figures are likely to change as the situation evolves. However, they suggest that COVID-19 is more deadly than influenza. For example, seasonal influenza typically leads to death in less than 0.1%Trusted Source of cases. When testing becomes easier and more widespread, health experts will have a more accurate insight into the exact number of severe cases and deaths. SARS is another type of coronavirus. It became a global pandemic in-. Around 9.6%Trusted Source of SARS cases led to death. However, COVID-19 is more contagious, and it is already the cause of more deaths worldwide. Symptoms Common symptoms of COVID-19 include Trusted Source: 1. A fever 2. Breathlessness 3. A cough 4. A sore throat 5. A headache 6. Muscle pain 7. Chills 8. New loss of taste or smell These symptoms are likely to occur 2–14 days Trusted Source after exposure to the virus. Aaron kandola (2020). 2.1.1 IN AFRICA World Economic Situation and Prospects (2020) indicated that the COVID-19 pandemic arrived at a moment when prospects for many African countries were promising. At the beginning of 2020, Africa was on track to continue its economic expansion, with growth projected to rise from 2.9 per cent in 2019 to 3.2 per cent in 2020, and 3.5 per cent in 2021. Important gains were being registered in poverty reduction and health indicators. Technology and innovation were being increasingly embraced across the continent, with young Africans acting as early adopters of new platforms such as mobile money. United nations (2020) Progress had also been made with respect to political unity and economic integration. The entry into force of the African Continental Free Trade Area (AfCFTA) in May 2019 promised to boost intra-African trade by as much as 25 per cent by 2040. UNCTAD, (2019), Furthermore, Olusola, A.F. (2018) stated that Africa enjoyed some of the highest global returns on foreign direct investment (FDI). United nations (2020) The first case of COVID-19 on the continent of Africa was reported on 14 February 2020. By 13 May, cases had been reported in all 54 countries. WHO COVID-19 Situation Reports (2020). The African Union acted swiftly, endorsing a joint continental strategy in February, and complementing efforts by Member States and Regional Economic Communities by providing a public health platform. The African Union Chairperson, President Cyril Ramaphosa of South Africa, appointed four Special Envoys to mobilize international support for Africa’s efforts to address the economic fallout of COVID-19. The East African Community, the Southern African Development Community, the Economic Community of West African States and the Intergovernmental Authority on Development (2020) unveiled initiatives within their respective regions indicated that Most African countries moved swiftly, enforcing quarantines, lockdowns and border closures. So far, countries with higher levels of testing have experienced lower infection rates, but limited capacity has rendered it difficult to discern accurate transmission, hospitalization and mortality rates. Regional Economic Communities have also been proactive, unveiling initiatives within their respective regions. United nations (2020) The Africa Centres for Disease Control and Prevention (Africa CDC), established in 2017, is curated real time information, with close collaboration with the World Health Organization (WHO). The Africa CDC’s new Partnership on Accelerated COVID-19 Testing (PACT), which aims to test 10 million people within six months, will complement government efforts while building important inroads into promoting knowledge-based pandemic management. WHO support for a significant ramp up to achieve this target will be vital, given that, to date, there is limited availability of test kits across the continent? The Africa CDC has also established the Africa COVID-19 Response Fund, in collaboration with the public- private Afro Champions initiative, to raise an initial $150 million for immediate needs and up to $400 million to support a sustained health response and socio-economic assistance to the most vulnerable populations in Africa. United nations (2020) However, relatively few countries have articulated initiatives to mitigate the socio-economic impacts of COVID-19 (see below). Country responses to COVID-19 in Africa as 2020. Socio-Economic Number of countries taking each measure Macroeconomic Exchange Rate Monetary Policy Fiscal Policy b Governance State of Emergency Lockdown or Curfew Closed Borders Travel Bans Source: Index Mundi, 2020 (www.indexmundi.com) Figure 1.0: Country responses to COVID-19 in Africa as 2020. FIGURE 1.1: Chat illustrating the consequences of COVID-19 in Africa. Hospital beds per 1,000 population 2.1.2.ECONOMIC IMPACT The COVID-19 pandemic began to impact African economies heavily and destroy livelihoods well before it reached the shores of the continent. Among the factors were: falling demand for Africa’s commodities; capital flight from Africa; a virtual collapse of tourism and air transport associated with lockdowns and border closures; and depreciation of local currencies as a result of a deterioration in the current account balance. African countries cannot afford to wait until the virus is contained before implementing socio-economic support programmes. Africa’s significant informal sector workers (85.8 per cent of the workforce ILO, (2018)) cannot comply with social distancing and stay-at-home orders without severe consequences for their lives and livelihoods. Many household earners would be forced to choose between the virus and putting food on the table. Additionally, almost 90% of women employed in Africa work in the informal sector, with no social protections. Female headed households are particularly at risk. The July 2020 start date of trade under the AfCFTA has been postponed due to the pandemic, delaying the promise of opportunities for new exports, jobs, investments in infrastructure and financing for Africa’s development. While negotiations for the AfCFTA are on hold, there is an opportunity for African countries to assess the potential impact of a prolonged delay and to lay the technical ground for its implementation. According to UNCTAD, SIDS are the most vulnerable to tourism collapse as the sector accounts for nearly 30% of their GDP. This share is over 50% for Seychelles. A decline in tourism receipts by 25% will result in a $7.4 billion or 7.3% fall in GDP in SIDS. As elsewhere in the world, the African airline industry, which supports 6.2 million people, and tourism, which accounts for a significant share of the GDP, in particular, of Small Island Developing States (SIDS), have been severely disrupted. South African Airways is on the brink of collapse, Ethiopian Airlines had lost an estimated US$550 million by early April, Air Mauritius has been placed under voluntary administration and RwandAir has cut salaries by 8 and 65 per cent, respectively for lowest paid employees and top earners. The resulting financing challenges will likely spill over to the rest of the economy as the risk of Non-Performing Loans rise. 2.1.3RECOMMENDATIONS FOR FOOD SECURITY •Focus where risks are most acute, strengthen social protection systems and safeguard access to food for the most vulnerable groups, especially for young children, pregnant and breastfeeding women, older people and other at-risk groups. •Release food from government grain reserves to counter potential food shortages. •Enforce anti-hoarding and anti-price gouging policies on food and other essential goods through measures such as informant hotlines. •Set up food banks in major cities and other affected areas and create mechanisms to identify those in need and to mobilize and receive donations (monetary or in-kind) from local and diaspora sources. •Designate the agriculture sector an essential economic activity that must continue regardless of pandemic-related emergency restrictions. •In addition to supporting smallholder farmers’ ability to increase food production and maintain sufficient liquidity, focus on urgent measures to reduce post-harvest loss through improved storage methods for key food staples. •Establish and protect food supply corridors (for collection, transport and distribution to markets), especially for land-locked and island states. •Measures, such as temporary reduction of VAT and other taxes on food, to be encouraged to keep food prices affordable. •Africa’s development partners to ease existing export restrictions, including export bans on food. 2.2. COVID-19 IN NIGERIA The human cost of COVID-19 will be high: beyond the loss of life, as the economy contracts and per capita incomes fall, the pandemic is projected to leave 5 million more Nigerians living in poverty in 2020 relative to the pre-COVID forecast. Household circumstances already leave Nigerians highly exposed to the pandemic; a reality that is hard to mitigate without certain reforms within Nigeria’s economy. In 2019, about 83 million people—equivalent to 4 in 10 Nigerians—were already living below the national poverty line, with millions only barely above it, making them vulnerable to falling into poverty when shocks occur. Over 75 percent of poor Nigerians live in the north of the country, most of whom depend on the informal economy or on smallholder farming. Household incomes are higher in central and southern Nigeria where job creation has traditionally been concentrated. Before COVID-19, the poverty rate was expected to increase by about 0.1 percentage points from 40.1 percent in 2019 to 40.2 percent in 2020, implying that the number of poor Nigerians would rise by 2.3 million, largely due to population growth. However, due to the recession, the poverty rate is now projected to increase by 2.4 percentage points to 42.5 percent in 2020, implying that the number of poor Nigerians would rise by 7.2 million. Thus, the COVID-19 shock alone is projected to push an additional 4.9 million. Nigerians into poverty in 2020. 2.3. Fig 2.2: Confirmed COVID-19 cases in Nigeria by state, as of 18 May 2021 Source: Nigeria Centre for Disease Control https://covid19.ncdc.gov.ng/ <100 100-500 500-1,000 1,000-5,000 5,000-10,000 ≥10,000 Source: Nigeria Centre for Disease Control https://covid19.ncdc.gov.ng/ Fig 2.3: Confirmed COVID-19 related deaths in Nigeria by state, as of 18 May 2021. Source: Nigeria Centre for Disease Control https://covid19.ncdc.gov.ng/ <10 10-50 50-100 ≥100 2.3. PREVENTION OF COVID-19 Finding effective ways to prevent the spread of SARS-CoV-2 remains a global challenge. Many viruses are preventable through antiviral vaccinations. However, it takes time to develop and distribute safe and effective vaccines. A vaccine for COVID-19 is likely to be available any time soon. The best way to prevent the virus from spreading is by avoiding close contact with people with COVID-19 and washing the hands regularly. The Centers for Disease Control and Prevention (CDC)Trusted Source recommend washing the hands with soap and water for at least 20 seconds per time. This is particularly important after being in public places. When soap is not available, use a hand sanitizer with at least 60%Trusted Source alcohol. Avoid touching the face before washing the hands. Governments, public bodies, and other organizations are also taking measures to prevent the spread of SARS-CoV-2. Look out for announcements of any new measures to stay up to date. People with COVID-19 should stay at home and avoid contact with other people to prevent the illness from spreading. Keep surrounding surfaces as clean as possible and avoid sharing household items. Always cover the mouth and nose when coughing or sneezing. Face masks are generally necessary for people who have the illness. Anyone who has regular contact with people with COVID-19 should also wear a face mask. 2.4. EMPIRICAL REVIEW OF LITERATURE Since the advent of the study of corona virus, several works have been done by many authors. Some of which are briefly summarized below: vector autoregressive (VAR) models for modeling and forecasting covid-19 variables with special focus on Nigeria cases from 1st march to 10th June 2020. At lag of order 2, the hypothesis of non-stationary is rejected at 5% level for all the multivariate variables using the augmented Dickey Fuller and Phillips-Perron unit root tests. The Granger causality test results indicate that there is a bivariate causal relationship among the variables by rejecting the null hypothesis of no Granger causality. The determinants of confirmed cases, new cases, and total deaths from covid-19 are generally significant at 5% level with p-value 0.0001 in each of the three derived models. The criteria AIC and log-likelihood implemented on the models confirmed that the VAR model of order 2 gives a better model for predictions and forecasts of covid-19 cases in Nigeria. This paper recommends a suitable model for handling multivariate time series data and suggests a reliable approach for forecasting future cases of covid-19 variables in the country and help health policy makers in finding solution to the unceasing upward trend in the cases of the pandemic. Ajao LO, et.al, (2020). 2.5. REGRESSION ANALYSIS Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function [Wikipedia, 2020]. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution. A related but distinct approach is necessary condition analysis (NCA), which estimates the maximum (rather than average) value of the dependent variable for a given value of the independent variable (ceiling line rather than central line) in order to identify what value of the independent variable is necessary but not sufficient for a given value of the dependent variable [Wikipedia, 2020]. Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable [Wikipedia, 2020]. The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes testable if a sufficient quantity of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods can give misleading results [Wikipedia, 2020]. In a narrower sense, regression may refer specifically to the estimation of continuous response variables, as opposed to the discrete response variables used in classification. The case of a continuous output variable may be more specifically referred to as metric regression to distinguish it from related problems [Wikipedia, 2020]. Regression models involve the following: i. The unknown parameters, denoted as β, which may represent a scalar or a vector. ii. The independent variables X. iii. The dependent variable, Y. A regression model relates Y to a function of X and β. (1) The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression analysis, the form of the function f must be specified. Sometimes the form of this function is based on knowledge about the relationship between Y and X that does not rely on the data. If no such knowledge is available, a flexible or convenient form for f is chosen. Assume now that the vector of unknown parameters β is of length k. In order to perform a regression analysis, the user must provide information about the dependent variable Y: i. If N data points of the form (Y, X) are observed, where Nk data points are observed. In this case, there is enough information in the data to estimate a unique value for β that best fits the data in some sense, and the regression model when applied to the data can be viewed as an over determined system in β [Wikipedia, 2020]. In the last case, the regression analysis provides the tools for: 1. Finding a solution for unknown parameters β that will, for example, minimize the distance between the measured and predicted values of the dependent variable Y (also known as method of least squares) [Wikipedia, 2020] . 2. Under certain statistical assumptions, the regression analysis uses the surplus of information to provide statistical information about the unknown parameters β and predicted values of the dependent variable Y [Wikipedia, 2020]. In linear regression, the model specification is that the dependent variable, is a linear combination of the parameters (but need not be linear in the independent variables). For example, in simple linear regression for modeling n data points there is one independent variable and two parameters. The regression model is given by the straight-line equation (2) In multiple linear regression, there are several independent variables or functions of independent variables. Adding a term in to the preceding regression gives: Parabola: This is still linear regression. Although the expression on the right-hand side is quadratic in the independent variable, it is linear in the parameters, , and Returning to the straight line case: Given a random sample from the population, we estimate the population parameters and obtain the sample linear regression model: The residual, is the difference between the value of the dependent variable predicted by the model, and the true value of the dependent variable One method of estimation is ordinary least squares. This method obtains parameter estimates that minimize the sum of squared residuals, SSE, also sometimes denoted RSS: SSE = (3) Minimization of this function results in a set of normal equations, a set of simultaneous linear equations in the parameters, which are solved to yield the parameter estimators, and In the case of simple regression, the formulas for the least square’s estimates are (4) (5) Where is the mean (average) of the Xvalues and is the mean of the Y values. Under the assumption that the population error term has a constant variance, the estimate of that variance is given by: This is called the mean square error (MSE) of the regression. The denominator is the sample size reduced by the number of model parameters estimated from the same data, (n-p) for p regressors or (n-p-1) if an intercept is used. In this case, p=1 so the denominator is n-2. The standard errors of the parameter estimates are given by Under the further assumption that the population error term is normally distributed, the researcher can use these estimated standard errors to create confidence intervals and conduct hypothesis tests about the population parameters. 2.6 MULTIPLE REGRESSION This is a statistical tool that examines how multiple independent variables are related to a dependent variable. Once one has identified how these multiple variables relate to the dependent variable, one can take information about all of the independent variables and use it to make much more powerful and accurate predictions about why things are the way they are. This latter process is called “Multiple Regression” [Wikipedia, 2020]. 2.7ESTIMATION OF MODEL PARAMETERS Recall the regression equation model (1) Where: Y = explained (response) variable, X1, X2, … , Xk = (k) explanatory variables, , and …, = unknown parameters to be estimated (known as regression coefficients) and e is the error term. The above model can be written in matrix notation as = where; Y is n x 1 vector of response variable, X is an n x k matrix of explanatory variables, is a k x 1 vector of unknown parameters and e is an n x 1 vector of error terms. Making e the subject of the of equation (ii), we have e = Y – X(3) We wish to find the vector of least squares estimators that minimizes Q = e’e = (Y - X)’(Y - X) (4) Q = e’e = (Y’Y - 2’X’Y + ’X’X)(5) This simplifies to (7) This is the set of least squares normal equations in matrix form. To solve the equation, multiply both sides by the inverse of X’X. Thus, the least squares estimator of for OLS is (8) 2.8.PROPERTIES OF LEAST SQUARE ESTIMATOR i. Least square estimator are unbiased and have variance-covariance matrix: Proof: Recall = Thus, where = I (1) Also below is the proof for the variance-covariance matrix of the model parameters using affine transformation; where E () = (2) 2.9. ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL Standard linear regression models with standard estimation techniques make a number of assumptions about the predictor variables, the response variable and their relationship. The following are the major assumptions made by standard linear regression models with standard estimation techniques. 1. The regression model is linear in parameters. 2. The expected value of the residual given any value of the explanatory is zero i.e., iii. The variances of the residual terms given any value of the explanatory variable equal this is known as homoscedasticity. iv. The values of the explanatory variables are fixed in repeated sampling. v. There is no correlation between the residual terms given any value of the explanatory variable i.e., C vi. There should be no specification error. vii. There is no linear relationship between the residual term and the explanatory variable i.e. C viii. The explanatory variables must not be collinear. ix. The residual term is normally distributed with mean zero and variance (i.e. ~ [Gujarati, 2004]. 2.9.1 Q-Q PLOT FOR NORMALITY ASSUMPTION A Q–Q plots is a plot of the quantiles of two distributions against each other, or a plot based on estimates of the quantiles [Wikipedia, 2020]. The main step in constructing a Q–Q plots is calculating or estimating the quantiles to be plotted. If one or both of the axes in a Q–Q plots is based on a theoretical distribution with a continuous cumulative distribution function (CDF), all quantiles are uniquely defined and can be obtained by inverting the CDF. If a theoretical probability distribution with a discontinuous CDF is one of the two distributions being compared, some of the quantiles may not be defined, so an interpolated quantile may be plotted. If the Q–Q plots is based on data, there are multiple quantile estimators in use. Rules for forming Q–Q plots when quantiles must be estimated or interpolated are called plotting positions [Wikipedia, 2020]. A simple case is where one has two data sets of the same size. In that case, to make the Q–Q plot, one orders each set in increasing order, then pairs off and plots the corresponding values. A more complicated construction is the case where two data sets of different sizes are being compared. To construct the Q–Q plot in this case, it is necessary to use an interpolated quantile estimate so that quantiles corresponding to the same underlying probability can be constructed [Wikipedia, 2020]. More abstractly, given two cumulative probability distribution functions F and G, with associated quantile functions F−1 and G−1 (the inverse function of the CDF is the quantile function), the Q–Q plot draws the q-th quantile of F against the q-th quantile of G for a range of values of q [Gibbons et al, 2003]. Thus, the Q–Q plot is a parametric curve indexed over [0,1] with values in the real plane R2 [Wikipedia, 2020]. Interpretation of Q-Q Plots The use of Q–Q plots which of interest in this study is to compare the distribution of a sample to a theoretical distribution, such as the standard normal distribution N(0,1), as in a normal probability plot. As in the case when comparing two samples of data, one orders the data (formally, computes the order statistics), then plots them against certain quantiles of the theoretical distribution [Thode, 2002]. 2.9.2SHAPIRO-WILK’S TEST FOR NORMALITY Theory: The Shapiro–Wilk test is a test of normality in frequentist statistics. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk. The test statistic is where; i. (with parentheses enclosing the subscript index i; not to be confused with is the ith order statistic, i.e., the ith-smallest number in the sample; ii. (⁄n is the sample mean; iii. the constants are given by where; m = (T (3) and ,…, are the expected values of the order statistics of in dependent and identically distributed random variables sampled from the standard normal distribution, and is the covariance matrix of those order statistics [Shapiro et al, 1965]. Test of Hypothesis: Under the following hypothesis: H0: Error term is normally distributed H1: Error term is not normally distributed If the p-value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not from a normally distributed population; in other words, the data are not normal. On the contrary, if the p-value is greater than the chosen alpha level, then the null hypothesis that the data came from a normally distributed population cannot be rejected (e.g., for an alpha level of 0.05, a data set with a p-value of 0.02 rejects the null hypothesis that the data are from a normally distributed population) [JMP, 2004]. As with most statistical tests, the test may be statistically significant from a normal distribution in any large samples. Thus a Q–Q plot is useful for verification in addition to the test [Wikipedia, 2018]. 2.9.3VARIANCE INFLATION FACTOR The variance inflation factor is the ratio of variance in a model with multiple terms, divided by the variance of a model with one term. It quantifies the severity of multicollinearity in an ordinary least squares regression analysis [Gujarati, 2004]. Procedure We calculate the VIF factor for with the following formula: where is the coefficient of determination of the regression equation in step one, with on the left-hand side, and all the other predictor variables on the right-hand side. 2.9.4DURBIN-WATSON TEST The Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. [Durbin and Watson; 1950, 1951] applied this statistic to the residuals from least squares regressions, and developed bounds tests for the null hypothesis that the errors are serially uncorrelated against the alternative that they follow a first order autoregressive process. Later, John Denis Sargan and Alok Bhargava developed several von Neumann–Durbin–Watson type test statistics for the null hypothesis that the errors on a regression model follow a process with a unit root against the alternative hypothesis that the errors follow a stationary first order autoregression [Sargan and Bhargava, 1983]. Note that the distribution of this test statistic does not depend on the estimated regression coefficients and the variance of the errors. Procedure and Interpretation: If e is the residual associated with the tth observation, then the test statistic is where T is the number of observations. If one has a lengthy sample, then this can be linearly mapped to the Pearson correlation of the time-series data with its lags. Since d is approximately equal to 2(1 − r), where r is the sample autocorrelation of the residuals, d = 2 indicates no autocorrelation. The value of d always lies between 0 and 4. If the Durbin–Watson statistic is substantially less than 2, there is evidence of positive serial correlation. As a rough rule of thumb, if Durbin–Watson is less than 1.0, there may be cause for alarm. Small values of d indicate successive error terms are positively correlated. If d > 2, successive error terms are negatively correlated. In regression, this can imply an underestimation of the level of statistical significance [Durbin and Watson; 1950, 1951]. To test for positive autocorrelation at significance α, the test statistic d is compared to lower and upper critical values (dL,α and dU,α): i. If d < dL,α, there is statistical evidence that the error terms are positively autocorrelated. ii. If d > dU,α, there is no statistical evidence that the error terms are positively autocorrelated. iii. If dL,α < d < dU,α, the test is inconclusive. Positive serial correlation is serial correlation in which a positive error for one observation increases the chances of a positive error for another observation. To test for negative autocorrelation at significance α, the test statistic (4 − d) is compared to lower and upper critical values (dL,α and dU,α): i. If (4 − d) < dL,α, there is statistical evidence that the error terms are negatively autocorrelated. ii. If (4 − d) > dU,α, there is no statistical evidence that the error terms are negatively autocorrelated. iii. If dL,α < (4 − d) < dU,α, the test is inconclusive. Negative serial correlation implies that a positive error for one observation increases the chance of a negative error for another observation and a negative error for one observation increases the chances of a positive error for another [Durbin and Watson; 1950, 1951]. The critical values, dL,α and dU,α, vary by level of significance (α), the number of observations, and the number of predictors in the regression equation. 2.9.5 F-TEST AND T-TEST These two measures are used to check for the model adequacy. Under the null hypothesis which says the model is inadequate, the below test statistic; (1) is used to compare with the tabulated value; . If the F-statistic is greater than the F-value from the table, then the null hypothesis of model inadequacy is rejected, otherwise we do not reject the null hypothesis [Abiodun, 2017]. In such a case when we reject the null hypothesis i.e., the model is adequate, and then we can use the T-TEST to check which of the variables contributes to the model i.e. to the rejection of the null hypothesis. Thus, we set the individual hypothesis and the below test statistic (2) is used to compare with the tabulated valueIf the |t-statistic| is greater than the t-value from the table, then we reject the null hypothesis i.e. the variable contributes to the model, otherwise we do not the null hypothesis [Abiodun, 2017]. 2.9.6 CRITER IA FOR EVALUATING PERFORMANCE OF THE MODEL The evaluation of the performance of the model will be done with the aid of the following criteria. These criteria include mean square error and the coefficient of multiple determination. CHAPTER THREE 3.1 DATA PRESENTATION AND DATA ANALYSIS 3.2 Introduction This section comprises presentation and analysis of data, the data used for this project work is a secondary data, extracted from World Health Organization (WHO) COVID-19 dashboard. (https://covid19.who.int/table) released May 25th, 2021. The variables present are used to carry out a multiple regression analysis to know which independent variable explains or have a significant impact on the dependent variable. Table 3.0: Presentation of data Name Cases - cumulative total Cases - cumulative total per 100000 population Cases - newly reported in last 7 days Cases - newly reported in last 7 days per 100000 population Cases - newly reported in last 24 hours Deaths - cumulative total Deaths - cumulative total per 100000 population Deaths - newly reported in last 7 days Deaths - newly reported in last 24 hours South Africa- Tunisia- Ethiopia- Egypt- Libya- Kenya- Nigeria- Algeria- Ghana- Zambia- Cameroon- Mozambique- Botswana- Namibia- Côte d’Ivoire- Uganda- Senegal- Madagascar- Zimbabwe- Sudan- Malawi- Angola- Cabo Verde- Rwanda- Gabon- Réunion- Guinea- Mayotte- Mauritania- Eswatini- Mali- Burkina Faso- Togo- Congo- Lesotho- South Sudan- Seychelles- Equatorial Guinea- Benin- Central African Republic- Gambia- Niger- Chad- Burundi- Sierra Leone- Comoros- Eritrea- Guinea-Bissau- Sao Tome and Principe- Liberia- Mauritius- United Republic of Tanzania- Saint Helena- 3.3 PRESENTATION AND INTERPRETATION OF RESULTS This section comprises of analysis and presentation of the result in this study using the methodology discussed in the previous chapter. Model specification; + Y = Deaths (cumulative total) Fitting the regression model using OLS: Table 3.1:Estimates for the regression Coefficients. Coefficients Model Unstandardized Coefficients Standardized Coefficients B Std. Error Beta (Constant) - Cases - cumulative total .022 .003 .634 Cases - cumulative total per 100000 population -.205 .197 -.050 Cases - newly reported in last 7 days -.718 .426 -.301 Cases - newly reported in last 7 days per 100000 population- .216 Cases - newly reported in last 24 hours- .194 Deaths - cumulative total per 100000 population- .045 Deaths - newly reported in last 7 days- .682 Deaths - newly reported in last 7 days per 100000 population - -.215 Deaths - newly reported in last 24 hours - -.226 The fitted regression Model using OLS. (3.2) 3.2.1TEST FOR MODEL ADEQUACY The following tests are used to check if the ordinary least square regression model applied is adequate and which variable(s) contributes to the adequacy of the model using the below Table 3.2: Anova table. ANOVA Model Sum of Squares Df Mean Square F Sig. 1 Regression- Residual- Total- Test for model significance: Hypothesis: H1 = Decision Rule: Reject the null hypothesis if p-value < α – value (0.05), otherwise do not reject the null hypothesis. Decision: Since = 0.000 < 0.05, then we reject the null hypothesis. Interpretation: We are able to conclude based on the available data that the at least one of the regressors is significant i.e., it has impact on the dependent variable at α = 0.05 level of significance. We can see that at least one independent variable contributes to the response variable, thus the regression model is said to be adequate. Now to test which variable(s) actually contributes to the dependent variable, we test each variable individually for the significance of each parameter. Individual t-test for the parameters: Hypothesis: Table 3.3: T- test table for individual parameters. Variables T Sig. Constant - Cases - cumulative total (- Cases - cumulative total per 100000 population () - Cases - newly reported in last 7 days () - Cases - newly reported in last 7 days per 100000 population (- Cases - newly reported in last 24 hours (- Deaths - cumulative total per 100000 population (- Deaths - newly reported in last 7 days (- Deaths - newly reported in last 7 days per 100000 population () - Deaths - newly reported in last 24 hours () - Decision rule: Reject Ho if tcal > ttab, or if p-value < α - value, Conclusion: Deaths - newly reported in last 7 days per 100000 population (X8), Deaths - newly reported in last 24 hours (X9), Cases - newly reported in last 7 days per 100000 population (X4), Deaths - newly reported in last 7 days(X7) and cumulative total (X1) are significant in the model, while, Cases - newly reported in last 7 days, Deaths - cumulative total per 100000 population, cases cumulative total per 10000 population (X2), cases newly reported in the last 24hrs (X5), and death cumulative total per 100000 population (X6) are not significant. We observed from the individual tests that only variables , and contribute significantly to the model. 3.4.Fitting and Testing for the adequacy of the reduced model. From the individual test for each model parameter, it was discovered that some independent variables contribute significantly to model while some do not contribute to the model, thus the significant independent variables are used in fitting a new model using the method of OLS. Table 3.6: Estimates for the regression Coefficients. B Std. Error Beta (Constant) - Deaths (newly reported in last 24 hours) (X9) - -0.193 Deaths (newly reported in last 7 days per 100000 population) (X8) - -0.173 Cases (cumulative total) (X1) - Cases (newly reported in last 7 days per 100000 population) (X- Deaths (newly reported in last 7 days) (X- Reduced model. (3.3) Table 3.7: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson- From the summary table, we find that the R is 0.994 showing a very strong positive relationship between the dependent and the independent variable. adjusted R2 of the model in equation is 0.987 with R2 = 0.988. This means that the linear regression model explains 98.8% of the variance in the data, or 98.8% of the variation in the dependent variable is explained by the independent variable. 3.4.1.Testing for Model adequacy of the reduced model The following tests are used to check if the ordinary least square regression model applied is adequate and which variable(s) contributes to the adequacy of the model using the below: Table 3.8: ANOVA table ANOVA Model Sum of Squares Df Mean Square F p.value. Regression- .000 Residual- Total- F-test for model significance: Hypothesis: H1 = Decision Rule: Reject the null hypothesis if p.value < α value(0.05), otherwise do not reject the null hypothesis Decision: Since = 0.00 < 0.05, then we reject the null hypothesis. Interpretation: We are able to conclude based on the available data that the at least one of the regressors is significant i.e., it has impact on the dependent variable at α = 0.05 level of significance. We can see that at least one independent variable contributes to the to the response variable, thus the regression model is said to be adequate. Now to test which variable(s) actually contributes to the dependent variable, we test individually for the significance of each parameter. Individual t-test for the parameters: Hypothesis: Table 3.9: T- test for individual parameters. Variables T sig Constant - Cases - cumulative total (- Cases - newly reported in last 7 days per 100000 population (- Deaths - newly reported in last 7 days (- Deaths - newly reported in last 7 days per 100000 population () - Deaths - newly reported in last 24 hours () - Decision rule: Reject Ho if tcal > ttab, or if p-value < α - value, Conclusion: Deaths - newly reported in last 7 days per 100000 population (X8), Deaths - newly reported in last 7 days (), Deaths - newly reported in last 24 hours (X9), Cases - newly reported in last 7 days per 100000 population (X4), and cumulative total (X1) all contribute to the cumulative death of COVID-19 patients in Africa. 3.5.TEST FOR VALIDATION OF ASSUMPTIONS 3.5.1.Testing for Linearity Assumption The following graphs depicts the relationship between the dependent variable (cumulative Death of COVID-19 patient in Africa) and each independent variable. Fig 3.6: Plot of cumulative death against each of the independent variables. Figure 3.7: Plot of cumulative death against cumulative total. It was observed from the plots in figure 3.6 and 3.7 that variables cases total cumulative, Cases-Newly reported in last 7 days per 100000 population, Death – newly reported in last 7days, Death – newly Reported in the last 7days per 100000 population and Death -newly reported in Last 24hours coefficients are linearly related to the dependent variable. 3.5.2.Testing for Homoscedastic (equality of variance) Assumption Fig 3.8: scatterplot of regression standardized residual against standardized predicted value 3.5.3.Testing for Normality Assumption 1. QQ plot. Fig 3.9 QQ plot As it can be seen from the Q-Q Plot that many of the studentized residuals falls in line with the straight line, then it shows that the assumption of normality is met. 2. Histogram Figure 3.9.1: Plot of standardized residuals against the observed cumulative probability for normality check. The histogram is bell shaped, which shows that Normality assumption is met. 3.9.1Autocorrelation of Error Term Assumption: Durbin Watson 1.390 i. Durbin Watson Test: Hypothesis: H0: There is positive autocorrelation H1: H0 is not true Test Statistic: d = 1.390 Decision Rule: Reject H0 if DW = 2 or greater than 2 otherwise do not reject H0. Decision: Since DW value is < 2, then we do not reject H0. Conclusion: There is enough evidence to support the claim that the three is positive autocorrelation at α = 0.05 level of significance 3.5.6 Multicollinearity Assumption: This assumption is tested using Variance Inflation Factor (VIF) value which is given as; Table 3.9.2: VIF values for the independent variables Independent Variables VIF Values Cases - cumulative total 10.644 Cases - newly reported in last 7 days per 100000 population 13.353 Deaths - newly reported in last 7 days 31.205 Deaths - newly reported in last 7 days per 100000 population 11.876 Deaths - newly reported in last 24 hours 13.024 It can be seen from the VIF values that there exists high multicollinearity of the independent variables i.e., majority of the values are greater than 10. CHAPTER FOUR SUMMARY, CONCLUSION AND RECOMMENDATION 4.0. SUMMARY This project work is on regression modelling of some statistical variables of covid-19 patients in Africa. The statistical tool used in this project work is multiple regression. From the test for model adequacy, based on the available data, it was discovered that at least one of the regressor is significant i.e., it has impact on the dependent variable at α = 0.05 level of significance. Fitting the model using all the independent variables we find out that the adjusted R2 of the model is 0.989, R = 0.995 showing a strong correlation between the dependent variable (cumulative dealth) and the independent variables. And the R2 = 0.098 showing that the linear regression explains 98.90% of the variance in the data, or 98.90% of the variation in the dependent variable is explained by the independent variable. Individual tests for the independent variables reveal that only variables , and contributes significantly to the model, the other independent variables were removed and fitted using the method of OLS. This means that Deaths of newly reported cases in last 7 days per 100000 population (X8), Deaths - newly reported in last 24 hours (X9), Cases - newly reported in last 7 days per 100000 population (X4), Deaths - newly reported in last 7 days(X7) and cumulative total (X1) contributes significantly to the cumulative dealt of COVID-19 patients. while, Cases - newly reported in last 7 days, Deaths - cumulative total per 100000 population, cases cumulative total per 10000 population (X2), cases newly reported in the last 24hrs (X5), and death cumulative total per 100000 population (X6) are not significantly related to the cumulative death of COVID-19 patients in Africa. The regression equation using OLS is given by: The Data met all assumptions except for that of the test for multicollinearity, a new model was fitted using the significant variables. The reduced model: Test to know the extent at which the model fit, we find out that the R is 0.994 showing a very strong positive relationship between the dependent and the independent variable. adjusted R2 of the model in equation is 0.987 with R2 = 0.988. This means that the linear regression model explains 98.8% of the variance in the data, or 98.8% of the variation in the dependent variable is explained by the independent variable. 4.1. CONCLUSION This study has illustrated that the available COVID-19 data satisfies conditions to be used for a multiple regression Analysis and that variables; Deaths of newly reported cases in last 7 days per 100000 population (X8), Deaths - newly reported in last 24 hours (X9), Cases - newly reported in last 7 days per 100000 population (X4), Deaths - newly reported in last 7 days(X7) and cumulative total (X1) can significantly predict the cumulative dealt of COVID-19 patients in Africa. 4.2. RECOMMENDATION From the analysis and findings obtained in this project work, I recommend that statistical techniques should be employed in a various health sectors to model relationship between variables, so as to proffer solutions and to reduce the risk of spread of communal diseases in Africa. References Aaron kandola (2020). Medically reviewed by Meredith Goodwin, MD, FAAFP . Coronavirus transmission: How it spreads and how to avoid it (medicalnewstoday.com). Abiodun A.A. “STA 333 – Introduction to Regression Analysis Notes”. Department of statistics, University of Ilorin. Ajao I.O, Awogbemi C.A and Ilugbusi A.O. (202). Vector and autoregressive models for multivariate time series analysis on COVID-19 pandemic in Nigeria. Ajulo H. K. (2018). “A study of the effects of some clinical variables on prostate cancer. Amzat J, Aminu K, Kolo VI, Akinyele AA, Ogundairo JA, Danjibo CM, Coronavirus Outbreak in Nigeria: Burden and Socio-Medical Response during the First 100 Days, International Journal of Infectious Diseases (2020), doi: https://doi.org/10.1016/j.ijid-. and treatment of adenocarcinoma of the prostate: II. radical prostatectomy treated patients, Journal of Urology 141(5),-. Breusch, T.S.; Pagan, A.R. (1979). “A Simple Test for Heteroscedasticity and Random coefficient of estimation”. Econometrica. 47(5):-. Cook, R. Dennis (February 1977). "Detection of Influential Observations in Linear Regression". Technometrics. American Statistical Association. 19 (1): 15–18. Cook, R. Dennis (March 1979). "Influential Observations in Linear Regression". Damodar N. Gujarati (2004). “Basic Econometrics, Fourth Edition”. Data”. 39th Hawaii International Conference on System Sciences (2006). doi:10.1093/biomet/-. JSTOR-. MR-. p. 593. doi:10.2307/-. JSTOR-. MR-. Durbin, J.; Watson, G.S. (1950). “Testing for serial Correlation in Least Squares Regression, I”. Biometrika. 37 (3-4): 409-428. doi:10.1093/biomet/-. Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003), Nonparametric statistical inference (4th ed.), CRC Press, ISBN-. Health Professional Version”. ILO, (2018): “Women and men in the informal economy: A statistical picture”. Jason W. Osborne and Amy Overbay (March 2004). "The Power of Outlier". Peerreviewed Electronic Journal. Volume 9, Pg 6. ISBN-. NCDC (2020); https://covid19.ncdc.gov.ng/. Accessed 25th of May 2021. Oranusi C.K. (2012). “Prostate cancer awareness and screening among male public servants in Anambra, Nigeria”. https://doi.org/10.1016/j.afju-. p. 223. ISBN-. Royston, Patrick (1992). “Approximating the Shapiro-Wilk test for normality”. Shapiro, S. S.; Wilk, M. B. (1965). "An analysis of variance test for normality Stamey, T.A., Kabalin, J.N., McNeal, J.E., Johnstone, I.M., Statistic and Computing. 2(3): 117-119. The free encyclopedia. (2004, July 22). FL: Wikimedia Foundation, Inc. Retrieved The World Bank Group, (2020): Nigeria in Times of COVID-19: Laying Foundations for a Strong Recovery: The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; fax:-; e-mail:- United Nations (202). Policy brief Impact of COVID-19 in Africa. https://www.bing.com/search?q=journals+on+corona+virus&cvid=335b9388b4484fbeb905cec30477ce08&aqs=edge World Health Organization. (2020). Coronavirus disease 2019 (COVID-19) Situation Report – 37. https://www.who.int/docs/default-source/coronaviruse/situation-reports/--sitrep-37- covid-19.pdf?sfvrsn=-e_2 [Access on March 15, 2020]. APPENDIX Name Cases - cumulative total Cases - cumulative total per 100000 population Cases - newly reported in last 7 days Cases - newly reported in last 7 days per 100000 population Cases - newly reported in last 24 hours Deaths - cumulative total Deaths - cumulative total per 100000 population Deaths - newly reported in last 7 days Deaths - newly reported in last 24 hours South Africa- Tunisia- Ethiopia- Egypt- Libya- Kenya- Nigeria- Algeria- Ghana- Zambia- Cameroon- Mozambique- Botswana- Namibia- Côte d’Ivoire- Uganda- Senegal- Madagascar- Zimbabwe- Sudan- Malawi- Angola- Cabo Verde- Rwanda- Gabon- Réunion- Guinea- Mayotte- Mauritania- Eswatini- Mali- Burkina Faso- Togo- Congo- Lesotho- South Sudan- Seychelles- Equatorial Guinea- Benin- Central African Republic- Gambia- Niger- Chad- Burundi- Sierra Leone- Comoros- Eritrea- Guinea-Bissau- Sao Tome and Principe- Liberia- Mauritius- United Republic of Tanzania- Saint Helena-