Jaya K | Freelancer Resume

K.JAYA Mobile:- Email- Technical Savvy machine learning/deep learning expertise with 5 years of research and implementation experience and 12+ years of experience in IT development experience. Have high knowledge in Python, Machine Learning, NLP, Deep Learning,Active Learning,Survival Analysis. Involved in development of POC of Data Science/Machine Learning projects, provided a feasible winning solution and ultimately winning the projects for the company. Developed, presented the POC and participated in technical discussion with the client. I have led, managed and completed multiple projects from the inception till the deployment. Kaggle Profile : https://www.kaggle.com/jkarayil Git : https://github.com/jkarayil ❖ ❖ Work Experience Dec 2018 –Till Date: Technical Lead - D ata Scientist: CIMB Malaysia June 2018 –Dec 2018: Data Scientist: MEDILENZ INNOVATIONS PRIVATE LIMITED ❖ 2003 – 2013 : Automation Architect. ❖ Mar 16 – Dec 16: DreamWorks Dedicated Unit, Bangalore as MTS ❖ Aug 15 – Mar 16 : Rokittech Inc, Bangalore as Member Technical Staff Core Qualification Statistical Skill : ● Data Manipulation : ETL(Extract-Transform-Loading) data technique with large datasets. ● Modeling : machine learning(regression,classification, clustering), categorical data analysis, time series sampling, dimensionality reduction, Ensembling, Artificial Intelligence, Natural language Processing, Optimization Algorithm(Genetic Programming – DEAP), Active Learning, Survival Analysis Computer Skill: ● Programming : Java, Python(scikit,panda,numpy,nltk),R, SQL, Perl,HIVE, Rest API ● Software : Python, Scikit, NLTK, KERAS, DEAP,MOSES, Matplotlib, Linux, Windows, Jupyter, Elastic Search, POSTMAN ● Cloud : Google Cloud, AWS IT Experience: ● Worked as automation architect for 12 years. ● Lead/mentored a team. Followed agile methodology. Have expertise in programming language perl, python, Java. ANNEXURE – PROJECTS HANDLED Technical Lead: CIMB Project: Author Name Disambiguation Duration: 6 Months Clients wanted a way to group authors and their publication in PUBMED and ORCID, so that they can leverage the result in their upstream task like, talking to authors with particular research, identifying authors who can evaluate their products etc. I provided them a solution which is a mix of clustering, classification and Active Learning using python, NLTK. The challenge here is the minimal amount of labelled dataset, which is resolved using Active Learning. Data which is in XML, JSON are stored and processed in AWS and searched through ElasticSearch. Achieved F1 Score- .91. Project : Survival Analysis Duration: 3 Months(POC) Implemented POC for the Diabetic Prediction using Survival Analysis. We developed an analytical solution , so that clients(payers) can predict patients who will develop diabetics in the coming month. This insight helps the clients to save 20k dollar/person which inturn save $327 Billion. Implemented the solution with AFT weibull distribution which can be used to forecast diabetics patient every month. We use 1million record, with the follow-up time of 5 years. The data had censored data. DATA SCIENTIST: CIMB Malaysia , Bangalore, India Project: Reduce CASA Attrition Duration: 4 months Building a classification model to identify the customer’s whose CASA balance will diminish by 60% or more in next 3 months. Data Set Details 300k(300,30,24) customers data size Duration: Aug 2018 to Feb 2019 Total no. of customers : 256,801 Total CASA Loss : 2k millions With the model, 1k million saved. 22.9% in (prime & preferred) customer, 27.4(mass customer) Features used: Features-demography, transaction table(saving, current, debit, credit) inflow 6m, outflow ,click transaction, product closed, min & max drop balance in past 6 month of the customers. top features - ttl_aum, l6m_db_txn, l6_cr_txn,inflow_1lm,inflow_lm, txn_db_l1m, loan_bal_1m, sav_bal_l6m, invt_bal_l1m, outflow_l6m Project: Wealth Propensity Model Duration: 4 months There are many wealth products of CIMB. This project concentrates on 'Preferred Customers' section and identifying who can become a customer of wealth products. There wealth products are like - Unit Trust, Structured Product, Gold Deposit Account PRS, Amanah Saham National Berhad Funds, Retail Bond,Max Invest Save, Dual Currency. There are 150k preferred customers of which only 35k hold Wealth Product. Of all the Wealth Product, the top most preferred products - Unit Trust, Structured Product, Gold Deposit Account PRS, Amanah Saham National Berhad Funds, Retail Bond constitute 80% of the customer holding. So requirement of the model is to identify high propensity customers who will be willing to buy wealth product. Training dataset: Data - Jan 2017 - Nov 2018 is considered. Top Features - Fixed Income Monthly Returns, Tenure, outflow, inflow, There are two channels to communicate to the customer - sms or by telephone call. The customer response is higher than with communication through SMS. But telephone call is expensive. So for the past two months, the sms campaign is run where the revenue is increased from 90m to 270m for UT product. Project: InBound Call Deflection Duration : 4 month The aim of the project is to reduce the number of sms to be sent to reduce the customer call to the call-center inturn reducing the workload of the customer call agent. Customer calls for various reason , the calllog is maintained as product/reason for each call. We developed propensity model to identify the customer who will call and used Multi-class classifier to label the reason. The customer calls for the past one-year, their demographics, their transaction for the past week, call trends w.r.t to transaction where used as feature for IBCD prediction. In addition to F1-Score, decile analysis is also done to select the top 2 decile to predict the most probable customers. Training dataset: Data - Jan 2017 - Feb 2019 is considered. Top Features - Tenure, calls counts(for previous 7 days), transaction count(for previous 7 days), amount transacted(for previous 7 days), mobile app user Achieved: 20% of the calls has been reduced. Project: Improve customer engagement based on transaction information. Duration: 2 month CIMB had 400 million transactions in a span of 6 months. The requirement of the business is to increase the customer engagement, based on the transaction. The solution is to group transaction based on the location, mall and target merchants with highest transaction. This is exercise is to improve offers to the customer, inturn to increase the bank business . We provided solution, which grouped 400 million transaction month-on-month basis, with 415 merchants which covers 90% of the transaction. Project : Tagging of tweets to recommend identify, trend in tweets. Duration : 2 month We developed end-to-end framework for categorization of large-scale bank Twitter data without the use of labeled data. CIMB had loads of twitter data, which requires labelling so as to get information on top tweets in a particular month. LDA is used to generate topics from the tweets. And the genericLabels are obtained by keyword mapping. This framework helped in achieving a coherency score of 0.75 and we labeled 100million tweets spanning across an year. Produced visualization of tweets trend on monthly basis, product wise etc. DATA SCIENTIST: MEDILENZ INNOVATIONS PRIVATE LIMITED , Bangalore, India Project : Medical Record Labelling Designation: Data Scientist Duration: June 2018 – Dec 2018 ● ● ● ● ● ● ● ● ● ● Developed model to classify medical records, which has multiple hierachial labels. Converted the medical records(image) file to text using Tesseract OCR. Created corpus from the manually labeled records which requires cleaning – removing duplicates, streamlining the labels. Used TF-IDF and Count-vectorizer to create features. Used scikit Naïve Bayes for the base model construction. Base model showed improvement with Count-vectorizer than wit TF-IDF. Base Model showed improvement, when added features which reflects the characteristics of the record. Created various model using Logistic Regression, Random Forest Classifier, XGBoost and fine tuning the various parameters, improved the model. Implemented LSTM(KERAS) for this multilabel,multiclass classification algorithm. Used metrics – accuracy score, hamming loss. Used Python, NLTK, SCI-KIT , Numpy. MTS: DreamWorks Dedicated Unit, Bangalore Project : Automate Machine Learning Framework Developement. Designation: MTS Mar 16 – Dec 16 Description: Involved in the development of framework in python, where the machine learning model generation can be automated. Developed modules which can speedup the ML development process. With this framework, we were able to automate 66% of the ML task. Datascientist, can concentrate on data cleaning and feature engineering rather than mundane task. MTS: Rokittech Inc, Bangalore, India Project : Improving Bug Triage Accuracy Designation: MTS Duration: Aug 15 – Mar 16 Description: Involved in Bug Triaging Project with the main aim of reducing the duplication of the bugs and enhanced the project, by adding/improving the priority and severity of the bug. This is implemented in KNN and later we tried with LDA. We achieved 10% improved performance in bug de-duplication. Paper Publshed Exploration of Corpus Augmentation Approach for English-Hindi Bidirectional Statistical Machine Translation System (http://iaesjournal.com/online/index.php/IJECE/article/view/8904) M.Tech Project – PROFESSIONAL ABRIDGEMENT - QA 12+ years of diverse experience in system integration, software testing and test automation framework development. MPLOYMENT SCAN ❖ ❖ ❖ Dec’10 –Oct ‘12: HP Software India Pvt. Ltd., Bangalore as MTS. Jul ’05 – Dec ’10: Juniper Networks Pvt Ltd., Bangalore Oct ’03 – Jul ’05: Cyberwerx Software Solutions Ltd., Bangalore (Cisco Systems through ❖ Feb ’03 – Oct ’03: Syntel India [P] Ltd. – Chennai as Sr. Software Engineer Offshore Development Center, Bangalore) SCHOLASTICS B. Tech, Computer Science and Engineering Pondicherry Engineering College, Pondicherry. Secured 72% (First Class with Distinction) - M.Tech, Computer Science and Engineering Amrita School of Engineering, Bangalore. Secured CGPA – 8.9 - Completed 32 hours of PMP training. PERSONAL DOSSIER Address : Flat no. 107, Sai Poorna High End, Haralur, Bangalore-102