Snehitha Pothina | Freelancer Resume

Snehitha Pothina Data Scientist Dallas, TX – 75081 | -| - PROFESSIONAL SUMMARY Data Scientist with 5 years of experience designing and deploying end-to-end machine learning and AI solutions across finance, healthcare, and technology sectors. Skilled in Python, Spark, Hadoop, AWS, and Snowflake, with proven expertise in developing predictive models, NLP systems, and time series forecasts for structured and unstructured data. Adept at building scalable data pipelines, implementing advanced analytics, and creating impactful dashboards using Tableau and Python. Strong collaborator with a track record of translating business requirements into actionable insights and driving data-driven strategies for stakeholders. TECHNICAL SKILLS • Databases: MySQL, PostgreSQL, Oracle, H Base, Amazon Redshift, MS SQL Server 2016/2014/2012/2008 R2/2008, Teradata, MongoDB, Snowflake • Statistical Methods: Hypothetical Testing, ANOVA, Time Series forecasting, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation, A/B Testing, Experimental Design, Hypothesis Testing • Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method • Deep Learning & NLP: Transformers (BERT, GPT, CLIP), Hugging Face, PyTorch, TensorFlow, Embeddings, Attention Mechanisms, Tokenization, Multimodal Models, Transfer Learning, Sequence models (RNN, LSTM, GRU) • AI/GenAI: Generative AI, Large Language Models (LLMs), Prompt Engineering, Lang Chain , Semantic Kernel • Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, Map Reduce, Hive, HDFS, Sqoop, Flume • Reporting Tools: Tableau Suite of Tools 10.x, 9.x, 8.x which includes Desktop, Server and Online, Server Reporting Services (SSRS) Data Visualization: Tableau, Matplotlib, Sea born, ggplot2 • Languages: Python (2.x/3.x), R, SAS, SQL, T-SQL • Operating Systems: PowerShell, UNIX/UNIX Shell Scripting (via Putty client), Linux and Windows WORK HISTORY Goldman Sachs - Dallas, TX Data Scientist • Nov 2024 – Present Developed and implemented predictive and classification models to analyze customer behavior, optimize decisionmaking, and enhance data-driven insights. • Collaborated with data engineers and operations team to implement ETL processes, writing and optimizing SQL queries and using Hive to retrieve data from Hadoop clusters and Redshift to fit analytical requirements. • Performed univariate and multivariate analysis to identify underlying patterns and associations in the data, and used F-Score, AUC/ROC, Confusion Matrix, MAE, and RMSE to evaluate different model performance. • Participated in feature engineering such as feature intersection generation, normalization, and label encoding with Scikit-learn preprocessing, including data cleaning and feature scaling using pandas and NumPy in Python. • Conducted analysis on customer consuming behaviors and discovered value of customers with RMF analysis, applying customer segmentation with clustering algorithms such as K-Means and Hierarchical Clustering. • Built regression models including Lasso, Ridge, SVR, and XGBoost to predict Customer Lifetime Value, and used XGB classifier for categorical variables and XGB regressor for continuous variables, combining results using Feature Union and Function Transformer methods in NLP. • Used Principal Component Analysis in feature engineering to analyze high-dimensional data. • Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score and predicted residency attainment of students. • Formulated several graphs to show the performance of students by demographics and their mean score in different USMLE exams. • Designed and implemented recommender systems utilizing collaborative filtering techniques to recommend courses for different customers and deployed to AWS EMR cluster. • Utilized natural language processing (NLP) techniques to optimize customer satisfaction. • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib. • Used generative AI techniques including BERT, GPT, and prompt engineering to automate insights extraction and enhance customer satisfaction metrics. • Integrated reinforcement learning techniques and agent-based models to optimize recommendation systems and personalize content delivery, contributing to real-time adaptive customer engagement strategies. • Developed and maintained MLOps pipelines using Airflow and CI/CD, automating model retraining, monitoring, and ensuring seamless deployment of machine learning solutions into production. Environment: AWS RedShift, EC2, EMR, Hadoop Framework, S3, HDFS, Spark (Pyspark, MLlib, Spark SQL), Python 3.x (Scikit- Learn/SciPy/Numpy/Pandas/NLTK/Matplotlib/Seaborn), Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), NLP, Teradata, Git 2.x, Agile/SCRUM McKesson - Irving, TX Data Scientist May 2024 – Oct 2024 • Performed exploratory data analysis (EDA) to uncover patterns, correlations, and trends in biological and medical data. • Developed MapReduce and Spark Python modules for predictive analytics and machine learning in Hadoop on AWS and wrote complex Spark SQL queries for business-driven data analysis. • Worked on data cleaning and ensured data quality, consistency, and integrity using Pandas and Numpy, and participated in feature engineering such as feature intersection generation, normalization, and label encoding with Scikit-learn preprocessing. • Developed and deployed computer vision models for object detection and image classification tasks, leveraging deep learning frameworks such as TensorFlow and PyTorch to analyze medical images and automate quality control processes. • Designed and implemented end-to-end pipelines for processing and annotating large-scale image datasets, optimizing data ingestion, augmentation, and preprocessing workflows for computer vision training. • Applied transfer learning and convolutional neural networks (CNNs) to accelerate model development for anomaly detection and pattern recognition in medical imaging data. • Designed and optimized high-throughput vision pipelines, enabling the analysis of millions of images for real-time decision making. • Used big data tools like Spark (Pyspark, Spark SQL, MLlib) on AWS to conduct real-time analysis of loan default. • Conducted data blending and preparation using Alteryx and SQL for Tableau consumption, and published data sources to Tableau Server. • Created multiple custom SQL queries in Teradata SQL Workbench to prepare optimized data sets for Tableau dashboards, retrieving data from multiple tables using various join conditions for efficient and actionable visualization. • Deployed and managed machine learning and NLP models using Azure Machine Learning Studio and Azure Synapse Analytics, ensuring scalable and secure integration of AI solutions into cloud-based business workflows. Environment: CNN, Computer vision, MS SQL Server 2014, Teradata, ETL, SSIS, Alteryx, Tableau (Desktop 9.x/Server 9.x), Python 3.x(Scikit- Learn/SciPy/Numpy/Pandas), Machine Learning (Naïve Bayes, KNN, Regressions, Random Forest, SVM, XGboost, Ensemble), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL), Hadoop 2.x, MapReduce, HDFS, SharePoint Tanla Platforms – Hyderabad, Telangana, India Data Scientist Aug 2021 – July 2023 • Gathered, analyzed, documented, and translated application requirements into data models, supporting the standardization of documentation and adoption of best practices related to data and applications. • Participated in data acquisition with the Data Engineering team to extract historical and real-time data using Sqoop, Pig, Flume, Hive, MapReduce, and HDFS, and wrote user-defined functions (UDFs) in Hive to manipulate strings, dates, and other data. • Performed data cleaning, feature scaling, and feature engineering using pandas and NumPy in Python, and applied clustering algorithms such as Hierarchical and K-means with Scikit-learn and Scipy. • Performed complex pattern recognition of automotive time series data and forecasted demand through ARMA and ARIMA models and exponential smoothing for multivariate time series data. • Delivered and communicated research results, recommendations, and opportunities to managerial and executive teams, and implemented techniques for priority projects. • Designed, developed, and maintained daily and monthly summary, trending, and benchmark reports repository in Tableau Desktop, generating complex calculated fields, parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, and statistical analyses to meet business requirements. • Implemented a variety of Tableau visualizations and views such as combo charts, stacked bar charts, pareto charts, donut charts, geographic maps, sparklines, and crosstabs. • Published workbooks and extracted data sources to Tableau Server, implemented row-level security, and scheduled automatic extract refreshes to ensure up-to-date and secure reporting. Environment: Machine learning (KNN, Clustering, Regressions, Random Forest, SVM, Ensemble), Linux, Python 2.x (Scikit- Learn/SciPy/NumPy/Pandas), R, Tableau (Desktop 8.x/Server 8.x), Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Sqoop, Flume, Oracle 11g, SQL Server 2012 Reckitt – Hyderabad, Telangana, India Data Scientist • • June 2020 – June 2021 Involved in developing and optimizing data integration processes, creating detailed financial reports, and performing advanced statistical analyses to support business decision-making. Used SSIS to create ETL packages to validate, extract, transform, and load data into Data Warehouse and Data Mart. • Maintained and developed complex SQL queries, stored procedures, views, table-valued functions, Common Table Expressions (CTEs), joins, and complex subqueries to provide comprehensive reporting solutions in Microsoft SQL Server 2008 R2. • Optimized the performance of queries with modification in T-SQL, removed unnecessary columns and redundant data, normalized tables, established joins, and created indexes. • Created SSIS packages utilizing Pivot Transformation, Fuzzy Lookup, Derived Columns, Condition Split, Aggregate, Execute SQL Task, Data Flow Task, and Execute Package Task. • Migrated data from SAS environment to SQL Server 2008 via SQL Integration Services (SSIS) and used SAS/SQL to pull data out from databases and aggregate for detailed reporting based on user requirements. • Designed and developed new reports and maintained existing reports using Microsoft SQL Reporting Services (SSRS) and Excel to support the firm's strategy and management, including sub-reports, drill-down reports, summary reports, parameterized reports, and ad-hoc reports. • Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses. • Provided statistical research analyses and data modeling support for mortgage products, performing analyses such as regression analysis, logistic regression, discriminant analysis, and cluster analysis using SAS programming. Environment: SQL Server 2008 R2, DB2, Oracle, SQL Server Management Studio, SAS/ BASE, SAS/SQL, SAS/Enterprise Guide, MS BI Suite (SSIS/SSRS), T-SQL, SharePoint 2010, Visual Studio 2010, Agile/SCRUM EDUCATION Texas Tech University - Lubbock, TX Masters in Computer Science