RESEARCH ON CORONA VIRUS
CHAPTER ONE
INTRODUCTION
1.0. BACKGROUND OF THE STUDY
Corona viruses are a large family of viruses that cause respiratory illness ranging from
the common cold to more severe disease such as middle east respiratory syndrome
(MERS) and severe acute respiratory syndrome (SARS). Coronal viruses are usually
transmitted between man and animal. The common symptoms include cough, fever, and
shortness of breath and tiredness while the less common symptoms are aches and pain,
sore throat, diarrhea, conjunctivitis, headache, loss of taste or smell, a rash on skin, or
discolouration of fingers or toes which may appear as few as two days after infection or
as long as 14 days after infection. Aaron kandola (2020)
Why covid 19 (SAR-COV-2) is new strain of corona virus that has not been previously
identified in human?
It was first identified in Wuhan china. The world health organization (WHO) and some
other national health agencies confirm that corona virus usually spread from infected
persons to others through:
1. Through the air during the process of coughing by the infected person and
sneezing.
2. Close personal contact such as touching or when shaking hands.
1
3. Touching of surfaces with the virus on it, then touching one’s eyes, mouth, nose
or eye before washing the hand.
The outbreak in Nigeria, according to the Nigeria Center for Disease Control (NCDC),
Nigeria recorded its first case of corona virus through an Italian in February 27th, 2020
which sent wave of panic across the country due to the unpreparedness of her health sector
to handle the deadly situation. Upon the detection of the index case, the NCDC instituted
a multi-sectorial National Emergency Operations center (EOC) to oversee the national
response to covid-19. Subsequently, the presidential Task Force (PTF) inaugurated on
March 9, 2020, whom announces the guidelines, the risk and the travel ban on 13 COVID19 high-risk countries and their restrictions from entering into the country. Also, the Port
Health Service and he NCDC itself monitored, the isolation of returning traveler from the
listed affected nations onward. Their isolation for each returnee lasted for 14days which
most of them did not comply with which led to the large outbreak in the Country. Jimoh
Amzat et.al, (2020).
Furthermore, the NCDC disclosed in the first 30days that most of the infected were
returnee and 70% of the individual tested positive were males while 30% were females.
Where Lagos state had the highest cases recorded.
Thus, in this study, we attempt to address the relationship between factors (such WHO
Regio, cumulative total cases , cumulative total cases per 100000 population, newly
reported cases in last 7 days, newly reported cases in last 7 days per 100000 population,
newly reported cases in last 24 hours, cumulative total deaths per 100000 population,
2
newly reported deaths in last 7 days, newly reported deaths in last 7 days per 100000
population, newly reported deaths in last 24 hours) that contribute to the cumulative total
deaths of COVID-19 patients in Africa.
1.1.
STATEMENT OF THE PROBLEM
This study observed that the whole world was curious about the increased number of
deaths of corona virus patients in Africa, which they do not put their focus on some factors
that might influence the increase in the death of the COVID- 19 patients. Also, these
factors might inversely or directly dictate the extent of death COVID-19 patients. This
pertinent reason prompted this study using the available data.
1.2.
AIM AND OBJECTIVES
The aim of this study is to investigate some determinants that contribute to COVID-19
cumulative deaths. The specific objectives are to:
i.
Estimate a model on the basis of the statistical variable(s) that contribute to the
cumulative deaths of COVID-19 patients.
ii.
Evaluate the performance of the model using some criteria.
iii.
Identify the variables that contributes significantly to cumulative death.
iv.
Observe the violation of assumptions of multiple linear regression on the COVID
– 19 data.
3
1.3.
SIGNIFICANCE OF THE STUDY
Knowing the current statistical variables that causes rapid increase in the death of
COVID-19 patients, with other countries/continents having adopted several methods to
prevent themselves from those things that causes increase in these statistical variables
which have significantly enhance their strategic plans to battle the virus humanly.
1.4.
SCOPE OF THE STUDY
The scope of this study is limited to the study of relationship of some determinants that
significantly associates to COVID-19 cumulative deaths.
1.5.
SOURCE OF DATA
The data used for this project work is secondary data, extracted from World Health
Organization (WHO) COVID-19 dashboard. (https://covid19.who.int/table) released
May 25th, 2021.
1.6.
ORGANIZATION OF THE STUDY
The project is organized as follows: Chapter One presents the Background of the Study
which includes the Introduction of the Subject of Study, Statement of the Problem, the
Aim and Objectives, Significance of the Study and the Scope of the Study. Chapter Two
covers the Literature Review, detailed explanation of Corona virus and the Methodology
used in the analysis of data is presented and explained, while Chapter Three comprises of
4
Data Presentation, Data Analysis and Interpretation of Results. Chapter Four presents
Summary, Conclusion, and Recommendation of the Study.
5
CHAPTER TWO
LITERATURE REVIEW AND METHODOLOGY
2.0.
INTRODUCTION
This chapter comprises of two sections, the first section entails Empirical Literature
Review related literature on COVID-19 in Nigeria. The second section describes the
methodology used to meet the Aim and Objectives of the Study.
2.1. EMPIRICAL REVIEW OF RELATED LITERATURE
Since the advent of the study of corona virus, several works have been done by many
authors. Some of which are briefly summarized below:
Ajao LO, et.al, (2020). in their study: Vector autoregressive (VAR) models for modeling
and forecasting covid-19 variables with special focus on Nigeria cases from 1st march to
10th June 2020 stated: Using a timeseries approach, At lag of order 2, the hypothesis of
non-stationary i.e changes either increase or decrease in the selected COVID-19 variables,
is rejected at 5% level for all the multivariate variables using the augmented Dickey Fuller
and Phillips-Perron unit root tests. The Granger causality test results indicate that there is
a bivariate causal relationship among the variables by rejecting the null hypothesis of no
Granger causality. The determinants of confirmed cases, new cases, and total deaths from
covid-19 are generally significant at 5% level with p-value 0.0001 in each of the three
derived models. The criteria AIC and log-likelihood implemented on the models confirmed
that the VAR model of order 2 gives a better model for predictions and forecasts of covid6
19 cases in Nigeria. He also indicate and recommend a suitable model for handling
multivariate time series data and suggests a reliable approach for forecasting future cases
of covid-19 variables in the country that will help health policy makers in finding solution
to the unceasing upward trend in the cases of the pandemic.
2.2.
REGRESSION ANALYSIS
Regression analysis is a statistical process for estimating the relationships among
variables. It includes many techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and one or more
independent variables (or 'predictors'). More specifically, regression analysis helps one
understand how the typical value of the dependent variable (or 'criterion variable')
changes when any one of the independent variables is varied, while the other independent
variables are held fixed. Most commonly, regression analysis estimates the conditional
expectation of the dependent variable given the independent variables – that is, the
average value of the dependent variable when the independent variables are fixed. Less
commonly, the focus is on a quantile, or other location parameter of the conditional
distribution of the dependent variable given the independent variables. In all cases, the
estimation target is a function of the independent variables called the regression
function.
7
Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Regression analysis is also used to
understand which among the independent variables are related to the dependent variable,
and to explore the forms of these relationships. In restricted circumstances, regression
analysis can be used to infer causal relationships between the independent and dependent
variables. However this can lead to illusions or false relationships, so caution is advisable
[Wikipedia, 2020].
The performance of regression analysis methods in practice depends on the form of the
data generating process, and how it relates to the regression approach being used. Since
the true form of the data-generating process is generally not known, regression analysis
often depends to some extent on making assumptions about this process.
These assumptions are sometimes testable if a sufficient quantity of data is available.
Regression models for prediction are often useful even when the assumptions are
moderately violated, although they may not perform optimally. However, in many
applications, especially with small effects or questions of causality based on observational
data, regression methods can give misleading results.
Regression models involve the following:
i.
The unknown parameters, denoted as β, which may represent a scalar or a
vector.
8
ii.
The independent variables X.
iii.
The dependent variable, Y.
A regression model relates Y to a function of X and β.
𝒀 = 𝜷𝟎 + ∑ 𝜷𝒊 𝑿𝒊
(2.0)
𝒘𝒉𝒆𝒓𝒆 𝒊: 𝟏 → 𝒌
The approximation is usually formalized as E(Y | X) = f(X, β). To carry out regression
analysis, the form of the function f must be specified. Sometimes the form of this function
is based on knowledge about the relationship between Y and X that does not rely on the
data. If no such knowledge is available, a flexible or convenient form for f is chosen.
Assume now that the vector of unknown parameters β is of length k. In order to perform
a regression analysis, the user must provide information about the dependent variable Y:
i.
If N data points of the form (Y, X) are observed, where Nk data points are observed. In this case,
there is enough information in the data to estimate a unique value for β that best
fits the data in some sense, and the regression model when applied to the data can
be viewed as an over determined system in β.
In the last case, the regression analysis provides the tools for:
1. Finding a solution for unknown parameters β that will, for example, minimize the
distance between the measured and predicted values of the dependent variable Y
(also known as method of least squares).
2. Under certain statistical assumptions, the regression analysis uses the surplus of
information to provide statistical information about the unknown parameters β and
predicted values of the dependent variable Y.
In linear regression, the model specification is that the dependent variable, is a linear
combination of the parameters (but need not be linear in the independent variables). For
example, in simple linear regression for modeling n data points there is one independent
variable
and two parameters.
The regression model is given by the straight-line equation
𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜀
(2.1)
10
In multiple linear regression, there are several independent variables or functions of
independent variables. Adding a term in
to the preceding regression gives:
Parabola: 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝛽2 𝑋2 2 + 𝜀.
(2.2)
This is still linear regression. Although the
expression on the right-hand side is quadratic in the independent variable X2, it is linear
in the parameters, 𝜷𝟎 , 𝜷𝟏 and 𝜷𝟐
Returning to the straight line case: Given a random sample from the population, we
estimate the population parameters and obtain the sample linear regression model: The
residual, 𝑒 = 𝑌𝑖 − 𝑌̂𝑖 is the difference between the value of the dependent variable
predicted by the model, 𝑌̂𝑖 and the true value of the dependent variable 𝑌𝑖 One method of
estimation is ordinary least squares. This method obtains parameter estimates that
minimize the sum of squared residuals, SSE, also sometimes denoted RSS:
SSE = ∑𝑛𝑖=1 𝑒𝑖2
(2.3)
Minimization of this function results in a set of normal equations, a set of simultaneous
̂0
linear equations in the parameters, which are solved to yield the parameter estimators, 𝛽
̂1
and 𝛽
In the case of simple regression, the formulas for the least square’s estimates are
11
𝛽̂1 =
𝑛 ∑ 𝑋𝑖 𝑌𝑖 − ∑ 𝑋𝑖 𝑌𝑖
𝑛𝑋12 −∑(𝑋𝐼 )2
(2.4)
̂𝟏
𝑎𝑛𝑑 𝛽̂0 = 𝒀̂𝒊 + 𝛽̂1 𝑿
(2.5)
̂ is the mean (average) of the Xvalues and 𝒀
̂ is the mean of the Y values.
Where 𝑿
Under the assumption that the population error term has a constant variance, the estimate
of that variance is given by:
𝑆𝑆𝐸
𝜎𝑒2 = 𝑛−2
(2.6)
This is called the mean square error (MSE) of the regression. The denominator is the
sample size reduced by the number of model parameters estimated from the same data,
(n-p) for p regressors or (n-p-1) if an intercept is used. In this case, p=1 so the denominator
is n-2.
The standard errors of the parameter estimates are given by:
1
𝜎𝛽0 = 𝜎𝑒 √𝑛 +
𝑥2
∑(𝑥−𝑥̂)2
(2.7)
1
𝜎𝛽1 = 𝜎𝑒 √ ∑(𝑥−𝑥̂)2
(2.8)
12
Under the further assumption that the population error term is normally distributed, the
researcher can use these estimated standard errors to create confidence intervals and
conduct hypothesis tests about the population parameters.
2.3
MULTIPLE REGRESSION
This is a statistical tool that examines how multiple independent variables are related to
a dependent variable. Once one has identified how these multiple variables relate to the
dependent variable, one can take information about all of the independent variables and
use it to make much more powerful and accurate predictions about why things are the way
they are. This latter process is called “Multiple Regression’’.
2.4
ESTIMATION OF MODEL PARAMETERS
Recall the regression equation model
𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + ⋯ + 𝜷𝒌 𝑿𝒌 + 𝒆
(2.9)
Where:
Y = explained (response) variable,
X1, X2, … , Xk = (k) explanatory variables,
𝜷𝟎 , 𝜷𝟏 and 𝜷𝟐 …,𝜷𝒌 = unknown parameters to be estimated (known as
coefficients) and e is the error term.
The above model can be written in matrix notation as
13
regression
̇ +𝒆
𝒀 = 𝑿𝜷
(2.9.1)
where;
Y is n x 1 vector of response variable,
X is an n x k matrix of explanatory variables,
is a k x 1 vector of unknown parameters and e is an n x 1 vector
of error terms?
Making e the subject of the of equation (ii), we have
e = Y – X𝜷
(2.9.2)
We wish to find the vector of least squares estimators 𝜷 that minimizes
Q = e’e = (Y - 𝑋𝛽)’ (Y - 𝑋𝛽)
(2.9.3)
Q = e’e = (Y’Y - 2𝛽’X’Y + ′𝑋′𝑋𝛽)
(2.9.4)
𝜕𝑄
𝜕𝛽
= −2𝑋 ′ 𝑌 + 2𝑋 ′ 𝑋𝛽̂ = 0
(2.9.5)
This simplifies to
̂ = 𝑋′𝑌
𝑋 ′ 𝑋𝜷
(2.9.6)
This is the set of least squares normal equations in matrix form. To solve the equation,
multiply both sides by the inverse of X’X. Thus, the least squares estimator of 𝜷 for OLS
is
̂ = (𝑿′ 𝑿)−𝟏 𝑿′𝒀
𝜷
(2.9.7)
14
2.5.
PROPERTIES OF LEAST SQUARE ESTIMATOR
i.
Least square estimator are unbiased and have variance-covariance matrix:
̂ ) = (𝜷)−𝟏 𝝈𝟐
𝑬(𝜷
Proof:
Recall = 𝛽̂ = (𝑿′ 𝑿)−𝟏 𝑿′𝒀
Thus,
𝐸(𝛽̂ ) = 𝐸((𝑿′ 𝑿)−𝟏 𝑿′𝒀)
= (𝑿′ 𝑿)−𝟏 𝑿′𝑬(𝒀)
where 𝑬(𝒀) = 𝑋𝜷
= (𝑿′ 𝑿)−𝟏 (𝑿′𝑿)𝜷
=I𝜷= 𝜷
(2.9.8)
Also below is the proof for the variance-covariance matrix of the model
parameters using affine transformation;
̂ = 𝐸(( 𝜷
̂ − 𝜷)′ (𝜷
̂ − 𝜷))
𝑽(𝜷)
𝐸(((𝑿′ 𝑿)−𝟏 𝑿′ )′((𝑿′ 𝑿)−𝟏 𝑿′ 𝜺))
= 𝐸((𝑿′ 𝑿)−𝟏 𝜺𝜺′ )
= ((𝑿′ 𝑿)−𝟏 𝑬(𝜺𝜺′ ) where E (𝜺𝜺′) = 𝝈𝟐
= (𝑋 ′ 𝑋)−1 𝝈𝟐
15
(2.9.9)
2.6.
ASSUMPTIONS OF CLASSICAL LINEAR REGRESSION MODEL
Standard linear regression models with standard estimation techniques make a number of
assumptions about the predictor variables, the response variable and their relationship.
The following are the major assumptions made by standard linear regression models with
standard estimation techniques.
1. The regression model is linear in parameters.
2. The expected value of the residual given any value of the explanatory is zero i.e.,
𝑒
𝐸( 𝑖⁄𝑥𝑖 = 0
iii.
The variances of the residual terms given any value of the explanatory variable
𝑒
equal
𝑖. 𝑒(𝑉𝑎𝑟( 𝑖⁄𝑥𝑖 = 𝜎 2 this is known as homoscedasticity.
iv.
The values of the explanatory variables are fixed in repeated sampling.
v.
There is no correlation between the residual terms given any value of the
explanatory variable i.e., C𝑜𝑟(𝑒𝑖 , 𝑒𝑗 /𝑥𝑖, 𝑥𝑗 = 0
vi.
There should be no specification error.
vii.
There is no linear relationship between the residual term and the
explanatory variable i.e. C𝑜𝑟(𝑒𝑖 /𝑥𝑗 = 0
viii.
The explanatory variables must not be collinear.
ix.
The residual term is normally distributed with mean zero and variance
(i.e. ~ 𝑁(0, 𝜎 2 ) [Gujarati, 2004].
16
2.7.
Q-Q PLOT FOR NORMALITY ASSUMPTION
A Q–Q plots is a plot of the quantiles of two distributions against each other, or a plot
based on estimates of the quantiles.
The main step in constructing a Q–Q plots is calculating or estimating the quantiles to be
plotted. If one or both of the axes in a Q–Q plots is based on a theoretical distribution
with a continuous cumulative distribution function (CDF), all quantiles are uniquely
defined and can be obtained by inverting the CDF. If a theoretical probability distribution
with a discontinuous CDF is one of the two distributions being compared, some of the
quantiles may not be defined, so an interpolated quantile may be plotted. If the Q–Q plots
is based on data, there are multiple quantile estimators in use. Rules for forming Q–Q
plots when quantiles must be estimated or interpolated are called plotting positions
A simple case is where one has two data sets of the same size. In that case, to make the
Q–Q plot, one orders each set in increasing order, then pairs off and plots the
corresponding values. A more complicated construction is the case where two data sets
of different sizes are being compared. To construct the Q–Q plot in this case, it is
necessary to use an interpolated quantile estimate so that quantiles corresponding to the
same underlying probability can be constructed.
More abstractly, given two cumulative probability distribution functions F and G, with
associated quantile functions F−1 and G−1 (the inverse function of the CDF is the quantile
17
function), the Q–Q plot draws the q-th quantile of F against the q-th quantile of G for a
range of values of q [Gibbons et al, 2003]. Thus, the Q–Q plot is a parametric curve
indexed over [0,1] with values in the real plane R2
Interpretation of Q-Q Plots
The use of Q–Q plots which of interest in this study is to compare the distribution of a
sample to a theoretical distribution, such as the standard normal distribution N(0,1), as in
a normal probability plot. As in the case when comparing two samples of data, one orders
the data (formally, computes the order statistics), then plots them against certain quantiles
of the theoretical distribution [Thode, 2002].
2.8.
SHAPIRO-WILK’S TEST FOR NORMALITY
Theory:
The Shapiro–Wilk test is a test of normality in frequentist statistics. It was published in
1965 by Samuel Sanford Shapiro and Martin Wilk. The test statistic is
𝑊=
(∑𝑛
𝑖=1 𝑎𝑖 𝑥(𝑖) )
2
(2.9.9.1)
∑𝑛
𝑖=1(𝑥𝑖 − 𝑥̂ )
where;
18
i.
𝑥(𝑖) (with parentheses enclosing the subscript index i; not to be confused with 𝑥(𝑖)
is the ith order statistic, i.e., the ith-smallest number in the sample;
ii.
(𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 )⁄n is the sample mean;
iii.
the constants 𝑎𝑖 are given by
𝑚𝑉
𝑎𝑖 , … , 𝑎𝑛 = (𝑚𝑇 𝑉 −1 𝑉 −1𝑚)1/2
(2.9.9.2)
where; 𝑚 = (𝑚1 , … , 𝑚𝑛 )𝑇
m = (𝑚1 , … , 𝑚𝑛 )T
(2.9.9.3)
and 𝑚1 ,…, 𝑚𝑛 are the expected values of the order statistics of in dependent and
identically distributed random variables sampled from the standard normal distribution,
and is the covariance matrix of those order statistics [Shapiro et al, 1965].
Test of Hypothesis:
Under the following hypothesis:
H0: Error term is normally distributed
H1: Error term is not normally distributed
If the p-value is less than the chosen alpha level, then the null hypothesis is rejected and
there is evidence that the data tested are not from a normally distributed population; in
other words, the data are not normal. On the contrary, if the p-value is greater than the
19
chosen alpha level, then the null hypothesis that the data came from a normally distributed
population cannot be rejected (e.g., for an alpha level of 0.05, a data set with a p-value of
0.02 rejects the null hypothesis that the data are from a normally distributed population)
[JMP, 2004]. As with most statistical tests, the test may be statistically significant from
a normal distribution in any large samples. Thus a Q–Q plot is useful for verification in
addition to the test. [Wikipedia, 2020].
2.9.0 VARIANCE INFLATION FACTOR
The variance inflation factor is the ratio of variance in a model with multiple terms,
divided by the variance of a model with one term. It quantifies the severity of
multicollinearity in an ordinary least squares regression analysis [Gujarati, 2004].
Procedure
1. Step one:
First we run an ordinary least square regression that has Xi as a function of all the
other explanatory variables in the first equation.
If i = 1, for example, the equation would be
𝑋1 = 𝛼2 𝑋2 + ⋯ + 𝛼𝐾−1 𝑋𝐾−1 + 𝑐0 + 𝑒
Where co is the constant and e is the error term.
2. Step two:
20
We calculate the VIF factor for 𝛽̂𝑖 with the following formula:
𝑉𝐼𝐹 =
1
1 − 𝑅𝐼2
2
where 𝑅𝐼 is the coefficient of determination of the regression equation in step one, with
𝑋𝐼 on the left-hand side, and all the other predictor variables on the right-hand side.
2.9.1 DURBIN-WATSON TEST
The Durbin–Watson statistic is a test statistic used to detect the presence of autocorrelation
at lag 1 in the residuals (prediction errors) from a regression analysis. [Durbin and Watson;
1950, 1951] applied this statistic to the residuals from least squares regressions, and
developed bounds tests for the null hypothesis that the errors are serially uncorrelated
against the alternative that they follow a first order autoregressive process. Later, John
Denis Sargan and Alok Bhargava developed several von Neumann–Durbin–Watson type
test statistics for the null hypothesis that the errors on a regression model follow a process
with a unit root against the alternative hypothesis that the errors follow a stationary first
order autoregression [Sargan and Bhargava, 1983]. Note that the distribution of this test
statistic does not depend on the estimated regression coefficients and the variance of the
errors.
21
Procedure and Interpretation:
If 𝑒𝑡 is the residual associated with the ith observation, then the test statistic is
𝑑=
∑𝑇𝑡(𝑒𝑡 − 𝑒𝑡−1 )2
∑𝑇𝑡=2 𝑒𝑡2
where T is the number of observations. If one has a lengthy sample, then this can be linearly
mapped to the Pearson correlation of the time-series data with its lags. Since d is
approximately equal to 2(1 − r), where r is the sample autocorrelation of the residuals, d =
2 indicates no autocorrelation. The value of d always lies between 0 and
4. If the Durbin–Watson statistic is substantially less than 2, there is evidence of positive
serial correlation. As a rough rule of thumb, if Durbin–Watson is less than 1.0, there may
be cause for alarm. Small values of d indicate successive error terms are positively
correlated. If d > 2, successive error terms are negatively correlated. In regression, this can
imply an underestimation of the level of statistical significance [Durbin and Watson; 1950,
1951].
To test for positive autocorrelation at significance α, the test statistic d is compared to lower
and upper critical values (dL,α and dU,α):
i.
If d < dL,α, there is statistical evidence that the error terms are positively
autocorrelated.
22
ii.
If d > dU,α, there is no statistical evidence that the error terms are positively
autocorrelated.
iii.
If dL,α < d < dU,α, the test is inconclusive.
Positive serial correlation is serial correlation in which a positive error for one observation
increases the chances of a positive error for another observation.
To test for negative autocorrelation at significance α, the test statistic (4 − d) is compared
to lower and upper critical values (dL,α and dU,α):
i.
If (4 − d) < dL,α, there is statistical evidence that the error terms are negatively
autocorrelated.
ii.
If (4 − d) > dU,α, there is no statistical evidence that the error terms are negatively
autocorrelated.
iii.
If dL,α < (4 − d) < dU,α, the test is inconclusive.
Negative serial correlation implies that a positive error for one observation increases the
chance of a negative error for another observation and a negative error for one observation
increases the chances of a positive error for another [Durbin and Watson; 1950, 1951].
The critical values, dL,α and dU,α, vary by level of significance (α), the number of
observations, and the number of predictors in the regression equation.
23
2.9.2
F-TEST AND T-TEST
These two measures are used to check for the model adequacy. Under the null hypothesis
which says the model is inadequate, the below test statistic;
𝐻0 = 𝑇ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑎𝑑𝑒𝑞𝑢𝑎𝑡𝑒
𝐻1 = 𝑇ℎ𝑒 𝑚𝑜𝑑𝑒𝑙 𝑖𝑠 𝑛𝑜𝑡 𝑎𝑑𝑒𝑞𝑢𝑎𝑡𝑒
𝐹=
𝑀𝑆𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑀𝑠𝐸𝑟𝑟𝑜𝑟
=
𝑆𝑆𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑘−1
𝑆𝑆𝐸𝑟𝑟𝑜𝑟
𝑛−𝑘
is used to compare with the tabulated value;
~𝐹(𝑘−1),(𝑛−𝑘)
𝐹𝑇𝐴𝐵 = 𝐹(𝑘−1),(𝑛−𝑘) . If the F-statistic
is greater than the F-value from the table, then the null hypothesis of model inadequacy is
rejected, otherwise we do not reject the null hypothesis [Abiodun, 2017].
In such a case when we reject the null hypothesis i.e., the model is adequate, and then we
can use the T-TEST to check which of the variables contributes to the model i.e.
to the rejection of the null hypothesis. Thus, we set the individual hypothesis
24
𝐻𝑂 : 𝛽𝐼 = 0 𝑣𝑠 𝐻1 : 𝛽𝐼 ≠ 0
and the below test statistic
𝑡=
𝛽̂𝑖
𝑠. 𝑒(𝛽̂𝑖 )
is used to compare with the tabulated value𝑡𝑇𝐴𝐵
~𝑡𝑛−𝑘
= 𝑡𝑛−𝑘,𝛼/2 If the |t-statistic| is greater
than the t-value from the table, then we reject the null hypothesis i.e. the variable
contributes to the model, otherwise we do not the null hypothesis [Abiodun, 2017].
25
CHAPTER THREE
3.0
DATA PRESENTATION AND DATA ANALYSIS
3.1
Introduction
This section comprises presentation and analysis of data, the data used for this project
work is a secondary data, extracted from World Health Organization (WHO) COVID-19
dashboard. (https://covid19.who.int/table) released May 25th, 2021. The variables present
are used to carry out a multiple regression analysis to know which independent variable
explains or have a significant impact on the dependent variable.
Table 3.0: Presentation of data
Name
South Africa
Cases cumulative
total
Cases cumulative
total per
100000
population
Cases newly
reported
in last 7
days
Cases newly
reported
in last 7
days per
100000
population
Cases newly
reported
in last
24 hours
Deaths cumulative
total
Deaths cumulative
total per
100000
population
Deaths newly
reported
in last 7
days
Deaths newly
reported
in last
24 hours
-
2757.55
21737
36.65
2893
55802
94.09
592
30
Tunisia
335345
2837.43
8773
74.23
1246
12236
103.53
387
54
Ethiopia
269194
234.16
2930
2.55
293
4076
3.55
80
8
Egypt
253835
248.04
8114
7.93
1145
14721
14.39
394
51
Libya
183311
2667.78
1901
27.67
412
3111
45.28
23
6
Kenya
168432
313.24
2967
5.52
324
3059
5.69
56
10
Nigeria
166019
80.54
310
0.15
40
2067
1
1
0
Algeria
126860
289.3
1549
3.53
209
3418
7.79
44
7
Ghana
93620
301.29
287
0.92
37
783
2.52
0
0
Zambia
93201
506.97
765
4.16
95
1268
6.9
8
1
Cameroon
76756
289.14
0
0
0
1230
4.63
0
0
Mozambique
70590
225.85
148
0.47
22
831
2.66
5
0
Botswana
54151
2302.7
1989
84.58
0
784
33.34
23
0
26
Namibia
52946
2083.75
1728
68.01
234
765
30.11
47
2
Côte d’Ivoire
46942
177.96
286
1.08
0
298
1.13
0
0
Uganda
43734
95.61
955
2.09
227
356
0.78
9
6
Senegal
41062
245.24
212
1.27
39
1130
6.75
5
1
Madagascar
40876
147.61
735
2.65
96
800
2.89
37
7
Zimbabwe
38682
260.26
122
0.82
3
1586
10.67
4
0
Sudan
34889
79.57
0
0
0
2446
5.58
0
0
Malawi
34284
179.22
70
0.37
10
1153
6.03
0
0
Angola
32441
98.71
1804
5.49
292
725
2.21
66
10
Cabo Verde
29334
5276.02
1166
209.72
136
256
46.04
7
0
Rwanda
26688
206.05
712
5.5
264
349
2.69
5
1
Gabon
24107
1083.1
308
13.84
0
147
6.6
4
0
Réunion
23566
2632.16
922
102.98
0
176
19.66
7
0
Guinea
22988
175.04
254
1.93
0
158
1.2
7
0
Mayotte
20176
7395.49
0
0
0
171
62.68
0
0
Mauritania
19149
411.84
321
6.9
35
458
9.85
1
0
Eswatini
18551
1599
31
2.67
1
672
57.92
0
0
Mali
14241
70.32
51
0.25
5
514
2.54
3
2
Burkina Faso
13415
64.18
18
0.09
1
165
0.79
1
0
Togo
13374
161.55
99
1.2
22
125
1.51
0
0
Congo
11476
207.97
133
2.41
0
150
2.72
2
0
Lesotho
10822
505.17
32
1.49
16
326
15.22
6
6
South Sudan
10670
95.32
18
0.16
0
115
1.03
0
0
Seychelles
Equatorial
Guinea
10669
-
928
943.6
236
38
38.64
8
0
8436
601.29
742
52.89
0
113
8.05
1
0
Benin
Central African
Republic
8025
66.2
41
0.34
0
101
0.83
0
0
7079
146.57
213
4.41
69
97
2.01
2
1
Gambia
5978
247.37
32
1.32
0
178
7.37
3
0
Niger
5383
22.24
50
0.21
19
212
0.88
20
20
Chad
4924
29.98
20
0.12
1
173
1.05
0
0
Burundi
4546
38.23
199
1.67
52
6
0.05
0
0
Sierra Leone
4121
51.66
16
0.2
4
79
0.99
0
0
Comoros
3942
453.31
9
1.03
2
146
16.79
0
0
Eritrea
3932
110.87
88
2.48
0
14
0.39
2
0
Guinea-Bissau
Sao Tome and
Principe
3751
190.6
5
0.25
2
68
3.46
1
0
2336
1065.89
9
4.11
2
37
16.88
2
1
27
Liberia
2142
42.35
13
0.26
0
85
1.68
0
0
Mauritius
United
Republic of
Tanzania
1322
103.95
34
2.67
0
17
1.34
0
0
509
0.85
0
0
0
21
0.04
0
0
0
0
0
0
0
0
0
0
0
Saint Helena
3.2
PRESENTATION AND INTERPRETATION OF RESULTS
This section comprises of analysis and presentation of the result in this study
using the methodology discussed in the previous chapter.
Model specification;
𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 + 𝜷𝟑 𝑿𝟑 + 𝜷𝟒 𝑿𝟒 + 𝜷𝟓 𝑿𝟓 + 𝜷𝟔 𝑿𝟔 + 𝜷7 𝑿𝟕 +
𝜷𝟖 𝑿𝟖 + 𝜷𝟗 𝑿𝟗 + 𝜀
Y = Deaths (cumulative total)
𝑿𝟏 = Cases (cumulative total)
𝑿𝟐 = Cases (cumulative total per 100000 population)
𝑿𝟑 = Cases ( newly reported in last 7 days)
𝑿𝟒 = Cases (newly reported in last 7 days per 100000 population)
𝑿𝟓 = Cases ( newly reported in last 24 hours)
𝑿𝟔 = Deaths (cumulative total per 100000 population)
𝑿𝟕 = Deaths (newly reported in last 7 days )
𝑿𝟖 = Deaths ( newly reported in last 7 days per 100000 population )
28
𝑿𝟗 = Deaths (newly reported in last 24 hours )
𝜀 = 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑟𝑜𝑟 𝑡𝑒𝑟𝑚
29
Fitting the regression model using OLS:
Table 3.1:
Estimates for the regression Coefficients.
Coefficients
Model
(Constant)
Cases - cumulative total
Cases - cumulative total per 100000
population
Cases - newly reported in last 7 days
Unstandardized Coefficients
B
Std. Error
-
-
Standardized
Coefficients
Beta
0.634
-0.050
-0.718
0.426
-0.301
Cases - newly reported in last 7 days
per 100000 population
Cases - newly reported in last 24 hours
12.950
5.001
0.216
3.417
2.200
0.194
Deaths - cumulative total per 100000
population
Deaths - newly reported in last 7 days
Deaths - newly reported in last 7 days
per 100000 population
Deaths - newly reported in last 24
hours
15.843
15.404
-0.045
49.998
-
-
0.682
-0.215
-162.691
40.723
-0.226
The fitted regression Model using OLS.
̂ = −𝟏𝟔𝟖. 𝟎𝟎𝟖 + 0.022𝑿𝟏 −0.205𝑿𝟐 + −0.718𝑿𝟑 + 12.950𝑿𝟒 +
𝒀
3.417𝑿𝟓 + 15.843 + 49.998𝑿𝟕 −-𝑿𝟖 − 162.691𝑿𝟗
30
(3.2)
3.2.1
TEST FOR MODEL ADEQUACY
The following tests are used to check if the ordinary least square regression model applied
is adequate and which variable(s) contributes to the adequacy of the model using the
below
Table 3.2: Anova table.
Model
1 Regression
Residual
Total
ANOVA
Sum of Squares
Df-
43
52
Mean Square
F-
Sig.
0.000
-
Test for model significance:
Hypothesis:
𝐻𝑜; 𝛽1 = 𝛽2 = 𝛽3 = 𝛽4 = 𝛽5 = 𝛽6 = 𝛽7 = 𝛽8 = 𝛽9 = 0
H1 = 𝐻𝑜 𝑖𝑠 𝑛𝑜 𝑡𝑟𝑢𝑒
Decision Rule:
Reject the null hypothesis if p-value < α – value (0.05),
otherwise do not reject the null hypothesis.
Decision:
Since = 0.000 < 0.05, then we reject
the null hypothesis.
31
Interpretation:
We are able to conclude based on the available data that
at least one of the regressors is significant i.e., it has impact
on the dependent variable at α = 0.05 level of significance.
We can see that at least one independent variable contributes to the response variable,
thus the regression model is said to be adequate.
Table 3.2.1: Summary table
R
R Square-
Adjusted R
Square
0.987
Using the R2 value above, it can be seen that 98.9% of the variation in the dependent
variable (cumulative death) is explained by the independent variable. Thus, the model is
adequate.
Now to test which variable(s) actually contributes to the dependent variable, we test each
variable individually for the significance of each parameter.
32
Individual t-test for the parameters:
Hypothesis:
𝐻𝑂 = 𝛽1 = 0
𝐻1 = 𝑁𝑜𝑡 𝐻𝑂
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 =
̂
𝛽
𝑖
𝑡=
~𝑡𝑛−𝑘
̂ )
𝑠. 𝑒(𝛽
𝑖
Table 3.3: T- test table for individual parameters.
Variables
T
Sig.
-1.023
0.312
7.118
0.000
-1.038
0.305
-1.687
0.099
2.589
0.013
1.553
0.128
1.028
0.309
4.949
0.000
-2.985
0.005
-3.995
0.000
Constant
Cases - cumulative total (𝑋1 )
Cases - cumulative total per 100000 population (𝑋2 )
Cases - newly reported in last 7 days (𝑋3 )
Cases - newly reported in last 7 days per 100000 population (𝑋4 )
Cases - newly reported in last 24 hours (𝑋5 )
Deaths - cumulative total per 100000 population (𝑋6 )
Deaths - newly reported in last 7 days (𝑋7 )
Deaths - newly reported in last 7 days per 100000 population (𝑋8 )
Deaths - newly reported in last 24 hours (𝑋9 )
33
Decision rule: Reject Ho if tcal > ttab, or if p-value < α - value,
Conclusion: Deaths - newly reported in last 7 days per 100000 population (X8), Deaths
- newly reported in last 24 hours (X9), Cases - newly reported in last 7 days per 100000
population (X4), Deaths - newly reported in last 7 days(X7) and cumulative total (X1)
are significant in the model, while, Cases - newly reported in last 7 days, Deaths cumulative total per 100000 population, cases cumulative total per 10000 population
(X2), cases newly reported in the last 24hrs (X5), and death cumulative total per 100000
population (X6) are not significant.
We observed from the individual tests that only variables 𝑋9 , 𝑋8 , 𝑋7 , 𝑋4 and
𝑋1 contribute significantly to the model.
34
3.4.
Fitting and Testing for the adequacy of the reduced model.
From the individual test for each model parameter, it was discovered that some
independent variables contribute significantly to model while some do not contribute to
the model, thus the significant independent variables are used in fitting a new model using
the method of OLS.
Table 3.4: Estimates for the regression Coefficients.
(Constant)
Deaths (newly reported in last 24 hours) (X9)
Deaths (newly reported in last 7 days per
100000 population) (X8)
Cases (cumulative total) (X1)
Cases (newly reported in last 7 days per
100000 population) (X4)
Deaths (newly reported in last 7 days) (X7)
B
Std. Error
-
-
Beta
-0.193
-
387.247
-0.173
-
-
-
43.029
6.469
0.587
Reduced model.
̂ = −𝟐𝟐𝟕. 𝟏𝟕𝟔 − 𝟎. 𝟎𝟐𝑿𝟏 + 8.954𝑿𝟒 + 43.029𝑿𝟕 − 1126,215𝑿𝟖 − 138.650𝑿𝟗
𝒀
(3.3)
35
3.4.1. Testing for Model adequacy of the reduced model
The following tests are used to check if the ordinary least square regression model applied
is adequate and which variable(s) contributes to the adequacy of the model using the
below:
Table 3.5: Model Summary
Model
1
R
R Square-
Adjusted R Std. Error of
Square
the Estimate-
DurbinWatson
1.390
From the summary table, we find that the R is 0.994 showing a very strong positive
relationship between the dependent and the independent variable. adjusted R2 of the model
in equation is 0.987 with R2 = 0.988. This means that the linear regression model explains
98.8% of the variance in the data, or 98.8% of the variation in the dependent variable is
explained by the independent variable.
Table 3.6: ANOVA table
Model
Regression
Sum of Squares-
Residual
Total
-
ANOVA
Df
Mean Square-
47
52
36
-
F
792.960
p.value.
.000
F-test for model significance:
Hypothesis:
𝐻𝑜; 𝛽1 = 𝛽4 = 𝛽7 = 𝛽8 = 𝛽9 = 0
H1 = 𝐻𝑜 𝑖𝑠 𝑛𝑜 𝑡𝑟𝑢𝑒
Decision Rule:
Reject the null hypothesis if p.value < α value(0.05),
otherwise do not reject the null hypothesis
Decision:
Since = 0.00 < 0.05, then we reject
the null hypothesis.
Interpretation:
We are able to conclude based on the available data that
the at least one of the regressors is significant i.e., it has
impact on the dependent variable at α = 0.05 level of
significance.
We can see that at least one independent variable contributes to the to the response
variable, thus the regression model is said to be adequate.
Now to test which variable(s) actually contributes to the dependent variable, we test
individually for the significance of each parameter.
37
Individual t-test for the parameters:
Hypothesis:
𝐻𝑂 = 𝛽1 = 0
𝐻1 = 𝑁𝑜𝑡 𝐻𝑂
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 =
̂
𝛽
𝑖
𝑡=
~𝑡𝑛−𝑘
̂ )
𝑠. 𝑒(𝛽
𝑖
Table 3.7: T- test for individual parameters.
Variables
Constant
Cases - cumulative total (𝑋1)
Cases - newly reported in last 7 days per 100000 population (𝑋4)
Deaths - newly reported in last 7 days (𝑋7)
Deaths - newly reported in last 7 days per 100000 population (𝑋8)
Deaths - newly reported in last 24 hours (𝑋9)
T
-
-2.908
-3.241
sig-
Decision rule: Reject Ho if tcal > ttab, or if p-value < α - value,
Conclusion: Deaths - newly reported in last 7 days per 100000 population (X8), Deaths
- newly reported in last 7 days (𝑋7), Deaths - newly reported in last 24 hours (X9), Cases
- newly reported in last 7 days per 100000 population (X4), and cumulative total (X1) all
contribute to the cumulative death of COVID-19 patients in Africa.
38
3.5.
TEST FOR VALIDATION OF ASSUMPTIONS
3.5.1. Testing for Linearity Assumption
The following graphs depicts the relationship between the dependent variable (cumulative
Death of COVID-19 patient in Africa) and each independent variable.
Fig 3.0: Plot of cumulative death against death newly reported in the last 7 days/100000
population.
39
Fig 3.1: Plot of cumulative death against newly reported cases in the last 7 days.
Figure 3.2: Plot of cumulative death against cumulative total.
40
It was observed from the plots in figure 3.0, 3.1 and 3.2 that variables cases total
cumulative, Cases-Newly reported in last 7 days per 100000 population, Death – newly
reported in last 7days, Death – newly Reported in the last 7days per 100000 population
and Death -newly reported in Last 24hours coefficients are linearly related to the
dependent variable.
3.5.2. Testing for Homoscedastic (equality of variance) Assumption
Fig 3.3: Scatterplot of regression standardized residual against standardized predicted
value.
41
The plot above indicates that the error is not normally distributed and shows a nonequality of variance (heteroscedacity), also the plot follows a pattern as the variance
increases, with a point on the plot showing outlier in the data.
3.5.3. Testing for Normality Assumption
1. QQ plot.
Fig 3.4 QQ plot
As it can be seen from the Q-Q Plot that many of the studentized residuals falls in line
with the straight line, then it shows that the assumption of normality is met.
42
2. Histogram
Figure 3.5: Plot of standardized residuals against the observed cumulative probability
for normality check.
The histogram is bell shaped, which shows that Normality assumption is met.
43
3.8.
Autocorrelation of Error Term Assumption:
Durbin Watson
i.
1.390
Durbin Watson Test:
Hypothesis:
H0: There is positive autocorrelation
H1: H0 is not true
Test Statistic:
Decision Rule:
d = 1.390
Reject H0 if DW = 2 or greater than 2
otherwise do not reject H0.
Decision:
Since DW value is < 2, then we do not reject
H0.
Conclusion:
There is enough evidence to support the claim
that the three is positive autocorrelation at α =
0.05 level of significance
44
3.5.6
Multicollinearity Assumption:
This assumption is tested using Variance Inflation Factor (VIF) value which is given
as;
′𝑉𝐼𝐹 =
1
1−𝑅𝐼2
Table 3.9.: VIF values for the independent variables
Independent Variables
Tolerance
VIF Values
Cases - cumulative total
0.094
10.644
Cases - newly reported in last 7 days per 100000 population
0.075
13.353
Deaths - newly reported in last 7 days
0.032
31.205
Deaths - newly reported in last 7 days per 100000 population
0.070
11.876
Deaths - newly reported in last 24 hours
0.084
13.024
It can be seen from the VIF values that there exists high multicollinearity of the
independent variables i.e., majority of the values are greater than 10. This might be due
to fact that the data is gotten from a population rather than a sample.
45
But the tolerance values for the independent variables indicates that there are no high
multicollinearity or singularity between the variables since none of the tolerance values
are above 0.9.
46
CHAPTER FOUR
SUMMARY, CONCLUSION AND RECOMMENDATION
4.0.
SUMMARY
This project work is on regression modelling of some determinants of COVID-19
cumulative deaths in Africa. The statistical tool used in this project work is multiple
regression. From the test for model adequacy, based on the available data, it was discovered
that at least one of the regressor is significant i.e., it has impact on the dependent variable
at α = 0.05 level of significance. Fitting the model using all the independent variables we
find out that the adjusted R2 of the model is 0.989, R = 0.995 showing a strong correlation
between the dependent variable (cumulative dealth) and the independent variables. And
the R2 = 0.098 showing that the linear regression explains 98.90% of the variance in the
data, or 98.90% of the variation in the dependent variable is explained by the independent
variable. Individual tests for the independent variables reveal that only variables
𝑋9 , 𝑋8 , 𝑋7 , 𝑋4 and
𝑋1 contributes significantly to the model, the other independent
variables were removed and fitted using the method of OLS.
This means that Deaths of newly reported cases in last 7 days per 100000 population (X8),
Deaths - newly reported in last 24 hours (X9), Cases - newly reported in last 7 days per
100000 population (X4), Deaths - newly reported in last 7 days(X7) and cumulative total
(X1) contributes significantly to the cumulative dealt of COVID-19 patients. while, Cases
- newly reported in last 7 days, Deaths - cumulative total per 100000 population, cases
47
cumulative total per 10000 population (X2), cases newly reported in the last 24hrs (X5),
and death cumulative total per 100000 population (X6) are not significantly related to the
cumulative death of COVID-19 patients in Africa.
The regression equation using OLS is given by:
̂ = −𝟏𝟔𝟖. 𝟎𝟎𝟖𝟒𝟕𝟒 -𝑿𝟏 −-𝑿𝟐 ± 0.71773𝑿𝟑 -𝑿𝟒
𝒀
-𝑿𝟓 -𝑿𝟔 -𝑿𝟕 −-𝑿𝟖
−-𝑿𝟗
The Data met all assumptions except for that of the test for multicollinearity, a new model
was fitted using the significant variables.
The reduced model:
̂ = −𝟐𝟐𝟕. 𝟏𝟕𝟔 − 𝟎. 𝟎𝟐𝑿𝟏 + 8.954𝑿𝟒 + 43.029𝑿𝟕 − 1126,215𝑿𝟖 − 138.650𝑿𝟗
𝒀
Test to know the extent at which the model fit, from the result, we find out that the R is
0.994 showing a very strong positive relationship between the dependent and the
independent variable. adjusted R2 of the model in equation is 0.987 with R2 = 0.988. This
means that the linear regression model explains 98.8% of the variance in the data, or
98.8% of the variation in the dependent variable is explained by the independent variable.
48
4.1.
CONCLUSION
This study has illustrated that the available COVID-19 data satisfies conditions to be used
for a multiple regression Analysis and that variables; Deaths of newly reported cases in
last 7 days per 100000 population (X8), Deaths - newly reported in last 24 hours (X9),
Cases - newly reported in last 7 days per 100000 population (X4), Deaths - newly reported
in last 7 days(X7) and cumulative total (X1) can significantly predict the cumulative dealt
of COVID-19 patients in Africa.
49
References
Aaron kandola (2020). Medically reviewed by Meredith Goodwin, MD, FAAFP .
Coronavirus
transmission:
How
it
spreads
and
how
to
avoid
it
(medicalnewstoday.com).
Abiodun A.A. “STA 333 – Introduction to Regression Analysis Notes”. Department of
statistics, University of Ilorin.
Ajao I.O, Awogbemi C.A and Ilugbusi A.O. (202). Vector and autoregressive models for
multivariate time series analysis on COVID-19 pandemic in Nigeria.
Ajulo H. K. (2018). “A study of the effects of some clinical variables on prostate cancer.
Amzat J, Aminu K, Kolo VI, Akinyele AA, Ogundairo JA, Danjibo CM, Coronavirus
Outbreak in Nigeria: Burden and Socio-Medical Response during the First 100
Days,
International
Journal
of
Infectious
Diseases
(2020),
doi:
https://doi.org/10.1016/j.ijid-.
and treatment of adenocarcinoma of the prostate: II. radical prostatectomy treated
patients, Journal of Urology 141(5),-.
Breusch, T.S.; Pagan, A.R. (1979). “A Simple Test for Heteroscedasticity and Random
coefficient of estimation”. Econometrica. 47(5):-.
Cook, R. Dennis (February 1977). "Detection of Influential Observations in Linear
Regression". Technometrics. American Statistical Association. 19 (1): 15–18.
50
Cook, R. Dennis (March 1979). "Influential Observations in Linear Regression".
Damodar N. Gujarati (2004). “Basic Econometrics, Fourth Edition”.
Data”. 39th Hawaii International Conference on System Sciences (2006).
doi:10.1093/biomet/-. JSTOR-. MR-. p. 593.
doi:10.2307/-. JSTOR-. MR-.
Durbin, J.; Watson, G.S. (1950). “Testing for serial Correlation in Least Squares
Regression, I”. Biometrika. 37 (3-4): 409-428. doi:10.1093/biomet/-.
Gibbons, Jean Dickinson; Chakraborti, Subhabrata (2003), Nonparametric statistical
inference (4th ed.), CRC Press, ISBN-.
Health Professional Version”.
ILO, (2018): “Women and men in the informal economy: A statistical picture”.
Jason W. Osborne and Amy Overbay (March 2004). "The Power of Outlier".
Peerreviewed Electronic Journal. Volume 9, Pg 6. ISBN-.
NCDC (2020); https://covid19.ncdc.gov.ng/. Accessed 25th of May 2021.
51
Oranusi C.K. (2012). “Prostate cancer awareness and screening among male public
servants in Anambra, Nigeria”. https://doi.org/10.1016/j.afju-.
p. 223. ISBN-.
Royston, Patrick (1992). “Approximating the Shapiro-Wilk test for normality”.
Shapiro, S. S.; Wilk, M. B. (1965). "An analysis of variance test for normality
Stamey, T.A., Kabalin, J.N.,
McNeal,
J.E.,
Johnstone,
I.M.,
Statistic and Computing. 2(3): 117-119.
The free encyclopedia. (2004, July 22). FL: Wikimedia Foundation, Inc. Retrieved
The World Bank Group, (2020): Nigeria in Times of COVID-19: Laying Foundations for
a Strong Recovery: The World Bank Group, 1818 H Street NW, Washington, DC
20433, USA; fax:-; e-mail:-
United Nations (202). Policy brief Impact of COVID-19 in Africa.
https://www.bing.com/search?q=journals+on+corona+virus&cvid=335b9388b44
84fbeb905cec30477ce08&aqs=edge
World Health Organization. (2020). Coronavirus disease 2019 (COVID-19) Situation
Report – 37. https://www.who.int/docs/default-source/coronaviruse/situation-
52
reports/--sitrep-37- covid-19.pdf?sfvrsn=-e_2 [Access on March
15, 2020].
53
APPENDIX
Name
South Africa
Cases cumulative
total
Cases cumulative
total per
100000
population
Cases newly
reported
in last 7
days
Cases newly
reported
in last 7
days per
100000
population
Cases newly
reported
in last
24 hours
Deaths cumulative
total
Deaths cumulative
total per
100000
population
Deaths newly
reported
in last 7
days
Deaths newly
reported
in last
24 hours
-
2757.55
21737
36.65
2893
55802
94.09
592
30
Tunisia
335345
2837.43
8773
74.23
1246
12236
103.53
387
54
Ethiopia
269194
234.16
2930
2.55
293
4076
3.55
80
8
Egypt
253835
248.04
8114
7.93
1145
14721
14.39
394
51
Libya
183311
2667.78
1901
27.67
412
3111
45.28
23
6
Kenya
168432
313.24
2967
5.52
324
3059
5.69
56
10
Nigeria
166019
80.54
310
0.15
40
2067
1
1
0
Algeria
126860
289.3
1549
3.53
209
3418
7.79
44
7
Ghana
93620
301.29
287
0.92
37
783
2.52
0
0
Zambia
93201
506.97
765
4.16
95
1268
6.9
8
1
Cameroon
76756
289.14
0
0
0
1230
4.63
0
0
Mozambique
70590
225.85
148
0.47
22
831
2.66
5
0
Botswana
54151
2302.7
1989
84.58
0
784
33.34
23
0
Namibia
52946
2083.75
1728
68.01
234
765
30.11
47
2
Côte d’Ivoire
46942
177.96
286
1.08
0
298
1.13
0
0
Uganda
43734
95.61
955
2.09
227
356
0.78
9
6
Senegal
41062
245.24
212
1.27
39
1130
6.75
5
1
Madagascar
40876
147.61
735
2.65
96
800
2.89
37
7
Zimbabwe
38682
260.26
122
0.82
3
1586
10.67
4
0
Sudan
34889
79.57
0
0
0
2446
5.58
0
0
Malawi
34284
179.22
70
0.37
10
1153
6.03
0
0
Angola
32441
98.71
1804
5.49
292
725
2.21
66
10
Cabo Verde
29334
5276.02
1166
209.72
136
256
46.04
7
0
Rwanda
26688
206.05
712
5.5
264
349
2.69
5
1
Gabon
24107
1083.1
308
13.84
0
147
6.6
4
0
Réunion
23566
2632.16
922
102.98
0
176
19.66
7
0
Guinea
22988
175.04
254
1.93
0
158
1.2
7
0
Mayotte
20176
7395.49
0
0
0
171
62.68
0
0
Mauritania
19149
411.84
321
6.9
35
458
9.85
1
0
Eswatini
18551
1599
31
2.67
1
672
57.92
0
0
54
Mali
14241
70.32
51
0.25
5
514
2.54
3
2
Burkina Faso
13415
64.18
18
0.09
1
165
0.79
1
0
Togo
13374
161.55
99
1.2
22
125
1.51
0
0
Congo
11476
207.97
133
2.41
0
150
2.72
2
0
Lesotho
10822
505.17
32
1.49
16
326
15.22
6
6
South Sudan
10670
95.32
18
0.16
0
115
1.03
0
0
Seychelles
10669
-
928
943.6
236
38
38.64
8
0
Equatorial Guinea
8436
601.29
742
52.89
0
113
8.05
1
0
Benin
Central African
Republic
8025
66.2
41
0.34
0
101
0.83
0
0
7079
146.57
213
4.41
69
97
2.01
2
1
Gambia
5978
247.37
32
1.32
0
178
7.37
3
0
Niger
5383
22.24
50
0.21
19
212
0.88
20
20
Chad
4924
29.98
20
0.12
1
173
1.05
0
0
Burundi
4546
38.23
199
1.67
52
6
0.05
0
0
Sierra Leone
4121
51.66
16
0.2
4
79
0.99
0
0
Comoros
3942
453.31
9
1.03
2
146
16.79
0
0
Eritrea
3932
110.87
88
2.48
0
14
0.39
2
0
Guinea-Bissau
Sao Tome and
Principe
3751
190.6
5
0.25
2
68
3.46
1
0
2336
1065.89
9
4.11
2
37
16.88
2
1
Liberia
2142
42.35
13
0.26
0
85
1.68
0
0
Mauritius
United Republic of
Tanzania
1322
103.95
34
2.67
0
17
1.34
0
0
509
0.85
0
0
0
21
0.04
0
0
0
0
0
0
0
0
0
0
0
Saint Helena
55