Subhajit Nayak | Freelancer Resume

Subhajit Nayak Data Scientist Address · Bangalore, India Phone ·- Email · - LinkedIn Profile · https://www.linkedin.com/in/subhajitnayak/ Business minded data scientist with 3 years of experience in executing data-driven solutions to increase efficiency, accuracy, and utility of internal data processing. Experienced at creating ML models by using predictive data modeling and analyzing data mining algorithms to deliver insights by following AI ethics. Experience Summary Having 9 years of IT experience and 3 years of experience and strong knowledge in Data Science, Predictive modelling, Natural Language Processing, Machine learning algorithms, Deep learning algorithm, feature engineering & predictive analysis, dealing mostly on various internal metrics forecasting & Anomaly detection in the system, model evaluation and visualization of data using Python, R, Dashboard, and Tableau. Following Agile methodology (JIRA board), collaboration tools like Gitlab, data science life cycle framework like CRISP-DM and using several open source python packages to resolve the analytical problems and accomplish the business problem solutions. Used various IDE and environments for managing the source code such as Anaconda, Jupyter, Spyder, PyCharm, etc. Experienced in analyzing huge size of structured and unstructured data, data transformation (deriving new fields) and building classification model by using various algorithm. Having hands-on experience in analyzing and applying feature extraction & data balancing techniques and implementing supervised & unsupervised learning models and boosting model performance. Textual data mining by implementing different text pre-processing, vectorization, log parsing techniques to get the insight and do experiment with different ml models. Implementing state of art deep learning algorithms to work on machine and human generated sequential data (textual data) such as seq2seq modeling, Encoder-Decoder architecture, RNN, LSTM, Transformer. Having strong knowledge and experience on various model evaluation techniques for regression, classification and deep learning models like confusion matrix, rouge, etc. Deployed models in on-premises server and cloud environment such as AWS (lambda, sagemaker, S3 bucket). Used deployment techniques like dockerization of the solution, pickle, Flask, APIs, Dashboard application. Hands on docker commands. Worked in google cloud platform (GCP) to write the code for model in jupyter labs in AI platform, written complex queries and optimization in BigQuery, built dataflow pipeline. Experienced in Web scraping to parse the data from the web urls by using Beautiful soup. Having experience and depth understanding on boosting the models and ensemble modelling. Good understanding of mathematical foundations behind Machine Learning algorithms. Following the AI ethics principles to build the model, data collection and feature selection. Analyzed and implemented an algorithm for Event correlation, using various techniques such as K-Means clustering. Extensively experienced in developing REST Web APIs, Micro services by using C# language and JSON. Having extensive experience and good understanding of Business Intelligence client tool like MSBI and ETL tools Like SSIS packages. Having hands on experience on SSIS packages, SSRS reporting. Worked extensively on SQL Server 2005, 2008 & 2012, Oracle, Writing SQL Queries, Stored procedures, Views, functions Worked on Liquid Metal Project, which was an internal project based on the concept of Smart Offices, and was part of the BT Challenge Cup, mostly on applications such as Parking detection, Cafeteria seat availability. Holding B1 USA Visa. Traveled Onshore(Dallas) for 3 months to attend business meetings with Clients. Have excellent presentation and managing stakeholder skills. Ability to work well in both a team environment and individual environment. Good work ethics with excellent communication, interpersonal and collaboration skills. Skills Data Analytics/Programming: Python (scikit-learn, Numpy, scipy, pandas, nltk, spaCy, pySpark) C#,Java Script,ideally Anaconda,Jupyter,Notebooks,SpyDer, google colaboratory, pyCharm. Database : Sql Server, Oracle, Mysql Visualisation: Python(Seaborn, Matplotlib), Plotly, Tableau Deployment & Version control tool : python pickle & Flask, collaboration/version control in GitLab, SVN, AWS, GCP Specialization: Natural Language Processing(reviews and log analysis), Encoder-Decoder architecture, RASA Framework Machine Learning Algorithms: Classification, Regression, Decision Tree, ensemble learning, Boosting, Random Forest, Clustering, Anomaly detection, Seq2Seq modeling, Tensor Flow, Keras. Statistical Skills: CRISP-DM, EDA, Feature engineering , feature selection, Hypothesis Testing, Regression Testing, error analysis, Confidence intervals, etc. Others: Apache Spark, HiveQL, Sqoop, Deep Learning, Text Mining, MVC architecture,REST API, SSIS,SSRS, Microsoft Excel, Angular, Azure IoT, Agile methodology, JIRA, GCP Experience Oct 2017 – Till now Data SCIENTIST, British telecom Projects: Fault detection and forecasting in devices Environments : Python(sklearn, pandas, numpy), Anomaly detection, Classification, spyder, Jupyter notebook, Dashboard, GitLab, Flask API, Agile (JIRA) Responsibility: Extracting the tabular data from different data sources like Oracle and Sql server. Done exploratory data analysis for extracting insights. Applied feature engineering and data transformation techniques by using variance threshold algorithm, manual feature selection. Scaling of the data and applying balancing techniques like SMOTE, standard scaler, etc and important feature extractions. Building classification model by using Random forest, XGBoost and hyperparameter tuning by using cross validation techniques. Built anomaly detection model by using Isolation forest, PCA, k-means clustering. Validating the model by using metrics like confusion matrix, precision, recall, PR-curve, Accuracy-AUC, etc. Building Dashboard by using plotly and flask API. Projects: server log analysis and anomaly detection and prediction Environments : Python(sklearn, pandas, numpy), Classification, spyder, Jupyter notebook, Dashboard, GitLab, Flask API, Time series, GCP Written script for fetching textual log data from database and mails and algorithm for log parsing to standardise the logs. Performed vectorization by using TF-IDF and word2vec. Built anomaly detection model by using one class SVM and isolation forest, PCA, etc. Forecasting the number of anomalies by using AR, MA. Evaluating the model by using manual and confusion matrix, precision and recall. Used Jupyter labs and big query platform to process, fetch the data and write logic for building the model. Visualising the anomalies by using plotly dash board. Projects: Fraud messaging Analysis Environments : Python(sklearn, pandas, numpy), Unsupervised learing, spyder, Jupyter notebook, dockerization, GitLab, Flask API, AWS Implemented clustering techniques, PCA to segmenting the fraud messages such as k-means clustering. Written scripts for fetching csv file data from outlook mails. Applied rules for detecting the fraud. Dockerized the deployment files and pushed it to S3 bucket. Written shell scripting for lambda function and running the model. Created scheduled cron jobs to run the model in particular period of time. Projects: Survey Text summarization tool Environments : Python(sklearn, pandas, numpy, nltk, spacy), Tensor flow, keras, Transformer, spyder, Jupyter notebook, GitLab Applied text various pre-processing techniques by using regular expressions, lemmatization, stemming and tokenization, etc. Followed encoder and decoder architecture to build the major components of the transformer model. Written custom functions for injecting the position of the word in word embedding (positional encoding) and masking the input vectors. Created the component (multi-head attention and scaled dot product) to as a tensorflow custom layers and feed forward networks. Sept 2016 – sept 2017 Senior developer, Epsilon Client: Hilton Group Project: Loyal Customer segmentation and classification Environments: Ensemble learning, Random Forest, Decision tree, XGBoost, Tablue, K-means Clustering Problem statement: Classifying and predicting the active and inactive loyal members to do email campaign in Hilton group. Responsibility: Applied exploratory data analysis techniques and feature engineering to analyses the large scale customer data sets and extraction of features. Detected anomalies. Applied Random forest and XGBoost to building the model and evaluation. Managed, facilitated and implemented the whole model to meet client’s requirement Analysed and applied clustering techniques (K-means clustering and PCA) to do the customer segmentation to send the right message to the right customers at the right time for achieving marketing objective. Sept 2014 – sept 2016 Consultant, Capgemini Client: Energy Future Holdings Responsibility: Written DB complex queries, stored procedures, functions in Sql server for reporting purpose. Developed micro services by using C#, angular, MC architecture and REST techniques. Worked heavily on MSBI tools like developing SSIS packages (automated jobs) to transfer the data from IBM DB2 to Sql server database and SSRS reporting Created data integration solution and ETL packages Followed agile process, attend scrum meetings and maintained JIRA tasks Jan 2014 – sept 2014 Software engineer, Aspire software consultancy Responsibility: Developed REST APIs to expose the data to various third party client in the XML and json by using MVC. Written complex sql and PL/Sql queries (procedure, functions) etc in Sql server and Oracle database. June 2011 – dec 2013 Software engineer, SMARTEDGE SOFTWARE PVT. LTD. Responsibility Written complex PL/Sql queries, triggers, functions, procedures etc Developed web applications by using Javascript, jquery ,HTML,CSS Worked heavily on UI pages Education Mar 2019-20 Masters of science, Liverpool john moores university Field of Study: Data Science Mar 2020 Post Graduation Diploma, INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY Field of Study: Data Science Credentials ID:- Credential Url : https://www.credential.net/rt4ff3wp June 2010 b.tech, BIJU PATNAIK UNIVERSITY OF TECHNOLOGY Field of Study: Computer science and enginering Language ENGLISH: Proficient HINDI: Proficient Additional info Name: Subhajit Nayak Nationality: Indian Sex: Male Marital Status: Married Location: Bangalore Passport No: L- USA B1 VISA: K-