CAPSTONE PROJECT
Data Science for Accounting
Portfolio Optimization Strategy
using Phyton Regression Model
3
My Project Background
(Business Problem)
Problem
How to Solve
Invalid columns in datasets of the weight for
stock picking concept
Eliminate the data (not use for modeling)
Assets diversification data is not ready
Calculate Traynor and Sharpe Ratio and
determine the portfolio can be well diversified or
not.
Small size and unclear source of datasets
Unsolved. Try to eliminate the barrier by
accurately preprocessing the data for a better
model.
4
Method & Workflow Project
Method used
- Portfolio Optimization Strategy Theory
- Data Science Steps using Python
- Multivariate Linear Regression (based on theory)
has been canceled and changed to Multiple Linear
Regression (for better modelling)
Workflow
-.
Import package
Load Dataset and data pre-processing
Define problem
EDA and Feature Engineering
Final Data Pre-Processing
Modeling
Evaluation
Load and Predict New Data
Conclusion and Recommendation
5
6
Result of Project
Python offer a powerful tool for
portfolio optimization and risk management.
The regression model well
surpasses in training and testing,
although struggles to generalize
same pattern to new data.
Portfolio weight is vital for calculating excess
return, invalid weight data source caused it
should be 100% removed, indicating a reason of
imperfect model
Portfolio optimization relies on various
indicators like macroeconomic fundamentals,
financial statements, asset prices, and
technical analysis, market value,
diversification, dividend-paying stocks, invest
in non-correlating assets (e.g., real estate,
currency, bonds), put option, stop-loss order,
and unforeseen risk. Invalid/insufficient
above data makes bias and variance.
7
Recommendations:
Python regression models provides objective and data-driven insights that guide investors
towards optimal investment decisions.
Variance between testing model and prediction model
suggesting exploration of making other data models using new data of porthfolio
•
•
•
•
Expand the Y-related dataset, including continuous and categorical data that influence
Y(dependent variable). Ensure data validity, e.g., audited ROE data, listed stock prices.
Increase the size of datasets and also diversity.
Implement Polynomial Regression and/or Artificial Neural Networks due to complex datasets
is significantly needed for portfolio optimization strategy.
Use Ridge Regression and Lasso Regression to address inappropriate bias and variance in
the modeling step..
8
>>>
Thank You!