Rizki Mayandi Hasibuan

Paper 9

QUANTITATIVE METHODS FOR INTERNATIONAL MARKETS Psid_60 Exercise 1 We start by modeling individual income as a function of years of schooling and socio-demographic factors. To this end, we estimate by OLS a regression model where the dependent variable is “income” and the regressors are given by “educ”, “age”, “gender”, “married” and “nkids”. a) Report your regression results and interpret the coefficient associated to “educ”. Is it statistically significant? The results demonstrate the importance of the regression model, as indicated by the p-value falling below the 5% significance level. In particular, the p-value related to the "educ" variable is likewise below the 5% threshold, implying that this specific variable holds statistical significance based on the outcomes of the regression model. b) Estimate the previous model using ln(income) as the dependent variable. Report your regression results and interpret the coefficient associated to “educ”. Is it statistically significant? According to the findings, it is evident that the regression model holds significance, given that the p-value is below the 5% significance threshold. Specifically, the p-value associated with "educ" is below the 5% threshold, leading to the inference that this particular variable is statistically significant according to the regression model results. Furthermore, it is noted that all independent variables demonstrate statistical significance based on p-values below 0.05. c) Considering the model in b), include educ2 as an additional regressor. Does this model provide a better fit with respect to the model in b)? Please consider both 𝑅2 and 𝑅2 adjusted in your comments. Based on the results, the regression model is considered significant due to the p-value being below the 5% significance level. Except for "nkids" with a p-value of-, all independent variables exhibit statistical significance with p-values below 0.05. R-squared (R2) and adjusted R-squared serve as metrics in regression analysis, assessing how well a model fits the data by measuring the degree to which independent variables explain the variation in the dependent variable. In summary, while R-squared evaluates overall model fit, adjusted R-squared takes into account the number of predictors, providing a more realistic evaluation of model fit, particularly in scenarios involving multiple predictors. Adjusted R-squared is preferred when comparing models with different predictor numbers. If adjusted R-squared closely mirrors R-squared, it suggests that additional predictors aren't significantly enhancing the model. As evident from the findings, the adjusted R-squared is 0.2871, signifying that approximately 28.71% of the variability in the dependent variable is explained by the independent variables, considering the number of predictors. The model's fitness aligns reasonably well with other models. d) Considering the model in c), test the null hypothesis of no effect of the years of schooling on income. Null hypothesis: The years of schooling don’t have a significant effect on income Alternative hypothesis: The years of schooling has a significant effect on income The results indicate that "educ" has a notable impact on income, as evidenced by a p-value less than 0.05. Additionally, the t-statistic, registering at 10.147 and exceeding 1.96, supports the rejection of the null hypothesis. e) How much is an additional year of schooling worth when the individual has 12 years of schooling? And when she has 16 years of schooling? We are focusing on the Beta1 coefficient, specifically for "educ." This coefficient signifies the estimated alteration in income linked to a one-year elevation in education. For an individual with 12 years of schooling: The "educ" coefficient stands for the anticipated income change for a one-year rise in education at this level. Likewise, for an individual with 16 years of schooling: The same "educ" coefficient can be employed to estimate the income change for a one-year increase in education at this higher level.The coefficient is equal to-*-=-*-=- f) Test the null hypothesis that marriage/civil partnership and having children have the same effect on income. Null Hypothesis: Marriage/civil partnership and having children have the same effect on income. Alternative Hypothesis: Marriage/civil partnership and having children have different effects on income. The results indicate that the variable "married" significantly influences income, as evidenced by a p-value below 0.05. Additionally, with a t-statistic of 10.652 surpassing 1.96, it supports the rejection of the null hypothesis for the "married" variable. Conversely, for the variable "nkids," the p-value is 0.4321, exceeding 0.05. Moreover, the t-statistic for "nkids" is 0.786, falling below 1.96. Consequently, "nkids" lacks statistical significance. In contrast, "married" holds statistical significance. Hence, there isn't enough justification to reject the null hypothesis. EXERCISE 2 This part refers to the model estimated in 1.c). a) Test the null hypothesis that the coefficients estimated using the subset of employees are equal to the coefficients estimated using the subset of self-employed. What is your conclusion? Null Hypothesis: The coefficients for employees are equal to the coefficients for self-employed. Alternative Hypothesis : The coefficients are different between employees and self-employed. The results highlight that the coefficients differ between the two groups. In the employee group, all variables exhibit statistical significance, with p-values below 0.05. For the self-employed group, the pvalues for "educ" and "married" are both below 0.05 (0.007 and 0.0000). This leads to the rejection of the null hypothesis, indicating that the coefficients for employees are not equal to those for the selfemployed. b) Using the approach by Breusch-Pagan, test the null hypothesis of homoskedasticity. What is your conclusion? Null Hypothesis: The errors are homoskedastic (constant variance). Alternative Hypothesis: The errors are heteroskedastic (non-constant variance). The Breusch-Pagan test was utilized to examine homoskedasticity. The test statistic returned a value of 23.253, and the associated p-value was found to be less than 0.05. Therefore, we reject the null hypothesis. In summary, the p-value from the studentized Breusch-Pagan test strongly suggests the existence of heteroskedasticity in the model. c) Considering the statistical evidence from 2.a) and 2.b), do you believe the model in 1.c) provides reliable estimates of the parameters and that the t-ratio and F tests are valid? If not, how would you proceed? We can draw conclusions and make model comparisons based on the R2 values. The R2 for Model (a) is 0.35, while the R2 for Models (b) and (c) is 0.27. This indicates that Model (a) is more reliable. Exercise 3 a) Explain using your own words why the model considered so far may suffer from endogeneity and the resulting consequences on the statistical properties of the OLS estimator. Please indicate the endogenous variable(s) and the included exogenous variable(s). Estimate the model in 1.b) by TSLS using the mother’s and father’s years of schooling as instrumental variables. Endogeneity occurs when a variable in a statistical model is correlated with the error term, violating a fundamental assumption. This compromises the accuracy of the Ordinary Least Squares (OLS) estimator, resulting in biased and inconsistent parameter estimates. It is essential to identify which variables are endogenous (affected by errors) and which are exogenous (not affected). Employing advanced techniques such as instrumental variables becomes crucial to address endogeneity and ensure more reliable results in econometric models and statistical analyses. In the presence of an endogenous variable, the OLS estimator becomes biased. The direction and magnitude of this bias depend on the nature of the endogeneity, potentially leading to either overestimation or underestimation of the true relationships between variables. This bias has implications for the reliability of the statistical inferences drawn from the regression analysis. Exogenous variable: In(income) Endogenous variables: educ^2, age, gender, married, nkids For estimating the model by TSLS, we should use two varibales equal to number of instrumental variables. Therefore, the model 1.b) is not suitable. In summary, the model suggests that education (educ) and gender are significant predictors of the log of income after accounting for potential endogeneity issues using instrumental variables. The overall model fit is statistically significant, as indicated by the Wald test. b) Explain using your own words why we might expect the instruments to be both relevant and exogenous. To ensure the validity and reliability of instrumental variable regression, it is crucial that the selected instruments satisfy two key criteria. Firstly, they must be relevant, demonstrating a correlation with the endogenous variable. This ensures that the instruments capture meaningful variations in the variable under consideration. Secondly, the instruments must be exogenous, meaning they are unrelated to the error term in the regression equation. This requirement ensures that the instruments do not introduce bias by being correlated with unobserved factors influencing the dependent variable. By meeting these conditions, the instruments effectively isolate the variation in the endogenous variable, offering a dependable solution to the endogeneity issue in the regression model. c) Using appropriate statistical tests, determine whether the instruments are relevant and exogenous. Null Hypothesis: The instruments are not jointly relevant in explaining the variation in the endogenous variable. Alternative Hypothesis: The instruments are jointly relevant in explaining the variation in the endogenous variable. We use F-test for checking the relevance. But we saw that the fitness of the model (R squared) is not good. Therefore, the relevant test is not valid. Null Hypothesis: The instruments are not jointly exogenous in explaining the variation in the endogenous variable. Alternative Hypothesis: The instruments are jointly exogenous in explaining the variation in the endogenous variable. We employ the Hausman Wu-test to assess exogeneity. Given the small p-value, we reject the null hypothesis. This suggests there is evidence of endogeneity in the model, and, as a result, the instrumental variable estimates are favored over the OLS estimates. d) Considering 3.b)-3.c), should we prefer the OLS or the TSLS estimates? If appropriate, use a statistical test to support your answer. The choice between Ordinary Least Squares (OLS) and Two-Stage Least Squares (TSLS) estimates is contingent on diagnostic tests and findings from instrumental variable (IV) analysis. The Wu-Hausman test provides evidence of endogeneity, as it rejects the null hypothesis of consistent and efficient OLS estimates. Since the p-value from the Hausman test is below 0.05, we reject the null hypothesis in favor of the alternative. This indicates a preference for TSLS estimates over OLS due to the identified presence of endogeneity. e) Using your preferred estimates, i.e., OLS or TSLS, indicate and interpret the coefficient associated with the regressor “gender”. Is it statistically significant? Null Hypothesis: Gender doesn’t have a significant effect on income Alternative Hypothesis: Gender has a significant effect on income Given that the p-value is 0.000, which is less than the conventional significance level of 0.05, we can reject the null hypothesis. Consequently, gender is deemed statistically significant. f) Suppose we have a panel of observations for 2015 and 2018. Also, note that the survey contains information on adults who have completed their education by the time of the first survey. How would you use the panel structure to eliminate endogeneity due to educ if the variable of interest is “age”? Would such a methodology be helpful if the primary variable of interest is “gender”, instead? Arranging observations in a panel format for the years 2015 and 2018 offers a strategy to mitigate endogeneity concerns associated with the variable "educ" (education) when studying the impact of "age." Panel data enable the exploration of changes within individuals over time, providing a means to account for individual-specific characteristics that may be linked to both education and age. EXERCISE 4 Let us define a new dummy variable, “low_income”, equal to 1 if “income” is less than or equal to 9500 USD (the 10th percentile of the actual income distribution). Estimate a Probit model using the regressors in 1.b). a) Explain using your own words why we do not use the linear regression model and resort to the Probit model? The decision to employ the Probit model instead of the linear regression model is driven by the nature of the dependent variable. Linear regression is well-suited for continuous outcomes, where the response variable can take any value. However, when dealing with binary outcomes, such as success or failure, the linear regression model might produce predictions outside the valid probability range of 0 to 1. In contrast, the Probit model is specifically tailored for binary outcomes. It utilizes a probit link function, ensuring that the predicted probabilities stay within the correct range. This feature makes the Probit model more suitable for situations where the dependent variable is binary. b) What is the estimated coefficient of “married”? Is it statistically significant? the estimated coefficient of “married”: - Null Hypothesis: “married” doesn’t have a significant effect on low income Alternative Hypothesis: “married” has a significant effect on low income Because the p-value is less than 0.05, we can reject the null hypothesis. Therefore, “married” is statistically significant. c) Compute the predicted probability of being a low-income earner for a single 40-year-old male with 12 years of schooling and no kids and for a single 40-year-old female with 12 years of schooling and no kids. d) Test the null hypothesis that gender and the number of children are jointly irrelevant for being a “low_income” worker. Null Hypothesis: The coefficients of "gender" and "nkids" are jointly equal to zero. Alternative Hypothesis: At least one of the coefficients of "gender" and "nkids" is not equal to zero. We use Wald test to check the hypothesis: Because the p-value is less than 0.05, we can reject the null hypothesis. Therefore, At least one of the coefficients of "gender" and "nkids" is not equal to zero.

Scheduled maintenance