K.JAYA
Mobile:-
Email-
Technical Savvy machine learning/deep learning expertise with 5 years of research and
implementation experience and 12+ years of experience in IT development experience.
Have high knowledge in Python, Machine Learning, NLP, Deep Learning,Active
Learning,Survival Analysis.
Involved in development of POC of Data Science/Machine Learning projects, provided a
feasible winning solution and ultimately winning
the projects for the company.
Developed, presented the POC and participated in technical discussion with the client.
I have led, managed and completed multiple projects from the inception till the
deployment.
Kaggle Profile : https://www.kaggle.com/jkarayil
Git : https://github.com/jkarayil
❖
❖
Work Experience
Dec 2018 –Till Date: Technical Lead - D
ata Scientist: CIMB Malaysia
June 2018 –Dec 2018: Data Scientist: MEDILENZ INNOVATIONS PRIVATE LIMITED
❖
2003 – 2013 : Automation Architect.
❖ Mar 16 – Dec 16: DreamWorks Dedicated Unit, Bangalore as MTS
❖ Aug 15 – Mar 16 : Rokittech Inc, Bangalore as Member Technical Staff
Core Qualification
Statistical Skill :
● Data Manipulation : ETL(Extract-Transform-Loading) data technique with large datasets.
● Modeling : machine learning(regression,classification, clustering), categorical data analysis, time
series sampling, dimensionality reduction, Ensembling, Artificial Intelligence, Natural language
Processing, Optimization Algorithm(Genetic Programming – DEAP), Active Learning, Survival
Analysis
Computer Skill:
● Programming : Java, Python(scikit,panda,numpy,nltk),R, SQL, Perl,HIVE, Rest API
● Software : Python, Scikit, NLTK, KERAS, DEAP,MOSES, Matplotlib, Linux, Windows, Jupyter,
Elastic Search, POSTMAN
● Cloud : Google Cloud, AWS
IT Experience:
● Worked as automation architect for 12 years.
● Lead/mentored a team. Followed agile methodology. Have expertise in programming language
perl, python, Java.
ANNEXURE – PROJECTS HANDLED
Technical Lead: CIMB
Project: Author Name Disambiguation
Duration: 6 Months
Clients wanted a way to group authors and their publication in PUBMED and ORCID, so that they
can leverage the result in their upstream task like, talking to authors with particular research, identifying
authors who can evaluate their products etc.
I provided them a solution which is a mix of clustering, classification and Active Learning using
python, NLTK. The challenge here is the minimal amount of labelled dataset, which is resolved using Active
Learning. Data which is in XML, JSON are stored and processed in AWS and searched through
ElasticSearch. Achieved F1 Score- .91.
Project : Survival Analysis
Duration: 3 Months(POC)
Implemented POC for the Diabetic Prediction using Survival Analysis. We developed an analytical
solution , so that clients(payers) can predict patients who will develop diabetics in the coming month. This
insight helps the clients to save 20k dollar/person which inturn save $327 Billion. Implemented the
solution with AFT weibull distribution which can be used to forecast diabetics patient every month. We use
1million record, with the follow-up time of 5 years. The data had censored data.
DATA SCIENTIST: CIMB Malaysia , Bangalore, India
Project: Reduce CASA Attrition
Duration: 4 months
Building a classification model to identify the customer’s whose CASA balance will diminish by
60% or more in next 3 months.
Data Set Details
300k(300,30,24) customers data size
Duration: Aug 2018 to Feb 2019
Total no. of customers : 256,801
Total CASA Loss : 2k millions
With the model, 1k million saved. 22.9% in (prime & preferred) customer,
27.4(mass customer)
Features used:
Features-demography, transaction table(saving, current, debit, credit) inflow 6m, outflow ,click
transaction, product closed, min & max drop balance in past 6 month of the customers.
top features - ttl_aum, l6m_db_txn, l6_cr_txn,inflow_1lm,inflow_lm, txn_db_l1m, loan_bal_1m,
sav_bal_l6m, invt_bal_l1m, outflow_l6m
Project: Wealth Propensity Model
Duration: 4 months
There are many wealth products of CIMB. This project concentrates on 'Preferred Customers'
section and identifying who can become a customer of wealth products. There wealth products
are like - Unit Trust, Structured Product, Gold Deposit Account PRS, Amanah Saham National
Berhad Funds, Retail Bond,Max Invest Save, Dual Currency.
There are 150k preferred customers of which only 35k hold Wealth Product.
Of all the Wealth Product, the top most preferred products - Unit Trust, Structured Product, Gold
Deposit Account PRS, Amanah Saham National Berhad Funds, Retail Bond constitute 80% of the
customer holding.
So requirement of the model is to identify high propensity customers who will be willing to buy
wealth product.
Training dataset:
Data - Jan 2017 - Nov 2018 is considered.
Top Features - Fixed Income Monthly Returns, Tenure, outflow, inflow,
There are two channels to communicate to the customer - sms or by telephone call. The
customer response is higher than with communication through SMS. But telephone call is
expensive. So for the past two months, the sms campaign is run where the revenue is increased
from 90m to 270m for UT product.
Project: InBound Call Deflection
Duration : 4 month
The aim of the project is to reduce the number of sms to be sent to reduce the customer
call to the call-center inturn reducing the workload of the customer call agent.
Customer calls for various reason , the calllog is maintained as product/reason for each call.
We developed propensity model to identify the customer who will call and used Multi-class
classifier to label the reason. The customer calls for the past one-year, their demographics, their
transaction for the past week, call trends w.r.t to transaction where used as feature for IBCD
prediction. In addition to F1-Score, decile analysis is also done to select the top 2 decile to
predict the most probable customers.
Training dataset:
Data - Jan 2017 - Feb 2019 is considered.
Top Features - Tenure, calls counts(for previous 7 days), transaction count(for previous 7 days),
amount transacted(for previous 7 days), mobile app user
Achieved: 20% of the calls has been reduced.
Project: Improve customer engagement based on transaction information.
Duration: 2 month
CIMB had 400 million transactions in a span of 6 months. The requirement of the
business is to increase the customer engagement, based on the transaction.
The solution is to group transaction based on the location, mall and target merchants
with highest transaction. This is exercise is to improve offers to the customer, inturn to increase
the bank business .
We provided solution, which grouped 400 million transaction month-on-month basis, with
415 merchants which covers 90% of the transaction.
Project : Tagging of tweets to recommend identify, trend in tweets.
Duration : 2 month
We developed end-to-end framework for categorization of large-scale bank Twitter data
without the use of labeled data. CIMB had loads of twitter data, which requires
labelling so as to get information on top tweets in a particular month. LDA is used to generate
topics from the tweets. And the genericLabels are obtained by keyword mapping. This
framework helped in achieving a coherency score of 0.75 and we labeled 100million tweets
spanning across an year. Produced visualization of tweets trend on monthly basis, product wise
etc.
DATA SCIENTIST: MEDILENZ INNOVATIONS PRIVATE LIMITED , Bangalore, India
Project : Medical Record Labelling
Designation: Data Scientist Duration: June 2018 – Dec 2018
●
●
●
●
●
●
●
●
●
●
Developed model to classify medical records, which has multiple hierachial labels.
Converted the medical records(image) file to text using Tesseract OCR.
Created corpus from the manually labeled records which requires cleaning – removing
duplicates, streamlining the labels.
Used TF-IDF and Count-vectorizer to create features.
Used scikit Naïve Bayes for the base model construction. Base model showed
improvement with Count-vectorizer than wit TF-IDF.
Base Model showed improvement, when added features which reflects the characteristics
of the record.
Created various model using Logistic Regression, Random Forest Classifier, XGBoost
and fine tuning the various parameters, improved the model.
Implemented LSTM(KERAS) for this multilabel,multiclass classification algorithm.
Used metrics – accuracy score, hamming loss.
Used Python, NLTK, SCI-KIT , Numpy.
MTS: DreamWorks Dedicated Unit, Bangalore
Project : Automate Machine Learning Framework Developement.
Designation: MTS Mar 16 – Dec 16
Description: Involved in the development of framework in python, where the machine learning
model generation can be automated. Developed modules which can speedup the ML
development process. With this framework, we were able to automate 66% of the ML task.
Datascientist, can concentrate on data cleaning and feature engineering rather than mundane
task.
MTS: Rokittech Inc, Bangalore, India
Project : Improving Bug Triage Accuracy
Designation: MTS Duration: Aug 15 – Mar 16
Description: Involved in Bug Triaging Project with the main aim of reducing the duplication of
the bugs and enhanced the project, by adding/improving the priority and severity of the bug. This
is implemented in KNN and later we tried with LDA. We achieved 10% improved performance in
bug de-duplication.
Paper Publshed
Exploration of Corpus Augmentation Approach for English-Hindi
Bidirectional
Statistical
Machine
Translation
System
(http://iaesjournal.com/online/index.php/IJECE/article/view/8904)
M.Tech Project –
PROFESSIONAL ABRIDGEMENT - QA
12+ years of diverse experience in system integration, software testing and test automation framework
development.
MPLOYMENT SCAN
❖
❖
❖
Dec’10 –Oct ‘12: HP Software India Pvt. Ltd., Bangalore as MTS.
Jul ’05 – Dec ’10: Juniper Networks Pvt Ltd., Bangalore
Oct ’03 – Jul ’05: Cyberwerx Software Solutions Ltd., Bangalore (Cisco Systems through
❖
Feb ’03 – Oct ’03: Syntel India [P] Ltd. – Chennai as Sr. Software Engineer
Offshore Development Center, Bangalore)
SCHOLASTICS
B. Tech, Computer Science and Engineering
Pondicherry Engineering College, Pondicherry.
Secured 72% (First Class with Distinction)
-
M.Tech, Computer Science and Engineering
Amrita School of Engineering, Bangalore.
Secured CGPA – 8.9
-
Completed 32 hours of PMP training.
PERSONAL DOSSIER
Address
: Flat no. 107, Sai Poorna High End, Haralur, Bangalore-102