Data Analytics: Report on analysis of SSLC data
Data Analytics
1
SSLC EXAMINATION DATA:
- Azad Krishna Triapthi
Exploratory Analysis:
UDUPI
2
Exploratory Analysis:
3
Performance based on total marks and
gender:
On the higher total marks
portion, overall gender girls
count density is higher than the
boys.
Exploratory Analysis:
4
Performance based on total marks and
gender:
Urban girls are performing
significantly higher than that of
rural girls but same is not the
case for boys
Exploratory Analysis:
Variation of all marks with respect to total
marks:
5
L1 marks are significantly
higher and S2 marks are
on the lower end.
6
L1
S2
Exploratory Analysis:
Age and fail/ pass proportions :
7
8
Exploratory Analysis:
Impact of individual marks on the total
marks and correlation between them :
9
10
Regression Analysis:
11
Total marks => Dependent Variable
Linear regression :
As S2 and S1 marks were highly correlated with the total marks. We take them as independent
Variable one by one.
Regression Analysis:
12
Linear Regression:
RMSE Error = 54.3641
RMSE Error =-
Regression Analysis:
S V M:
RMSE Error = 51.93
13
Regression Analysis:
14
S V M: Tuning and improved performance
Best performance is given at:
Epsilon = 0.302
RMSE Error = 51.8
Classification:
15
Decision Tree:
Pass/Fail classification based on age
Classification:
16
Decision Tree:
0 – 25 => Low
25 – 50 => Med
50 – 75 => Good
75 – 100 => High
NRC_CLASS
classification based
on marks.
L3_CAT = Good, High
Classification:
17
Random Forest:
NRC_CLASS classification based on
marks.
Clustering:
Clusters based on all marks: L1, L2, L3, S1, S2, S3
Picking up number of clusters.
Lesser the sum of squares, better will
be the clusters.
Here, looking at the graph, 3 – 4
clusters will be good to choose with
less complexity.
18
Clustering:
19
Clusters based on all marks: L1, L2, L3, S1, S2, S3
NRC_CLASS => D > 1 > 2 > PASS > FAIL
Clusters => 2 > 1 > 4 > 3
Clustering:
20
Clusters based on all marks: L1, L2, L3, S1, S2, S3
P_H
MED
Association Rules:
Finding rare events: We analyzed physical conditions.
21
Association Rules:
After grouping all the physical condition as either PHY_HANDICAPT or Not, there are 4
rules generated as follows.
Sorting all the rule based on lift.
NRC_CLASS => D > 1 > 2 > PASS > FAIL
22
23
Thank You !!