Abdul Quddus | Freelancer Resume

Abdul Quddus Senior Data Engineer-| - | Brooklyn, New York, 11220 Summary I am a Seasoned Senior Data Engineer with 7+ years of experience designing and optimizing data pipelines across industries like manufacturing, retail, Health and cybersecurity. I specialize in ETL/ELT workflows, data modeling, and real-time processing using tools like Apache Spark, Airflow, and Hadoop. Proficient in cloud platforms such as AWS (Glue, Sagemaker, Lambda, EC2, Redshift, S3), GCP and Azure (Data Factory, SQL Data Warehouse), I work with databases including MySQL, PostgreSQL, Snowflake, and MongoDB. I am skilled in Python (Pandas, NumPy, Matplotlib, Plotly, Scikit learn), SQL, NoSQL, Tensorflow, and building REST APIs with Flask and FastAPI. Plus, I have expertise in data visualization tools (Power BI, Looker, Tableau and AWS Quicksight) enabling impactful insights through dynamic dashboards and reports. Extensive experience in developing, deploying, and maintaining machine learning models and solutions across various sectors. Technical Skills · Programming Languages: Python (with emphasis on SQL), R, SQL. · Python Frameworks: Flask, FastAPI, Jinja2, Django. · Cloud Technologies: AWS (S3, EC2, RDS, Glue, Redshift, Lambda, SageMaker), Azure (Azure ML, Azure Data Factory). · Databases: MS SQL, MySQL, Postgres, MongoDB, ElasticSearch, Neo4j. · Data Visualization Tools: Power BI, Looker, Tableau & AWS Quicksight. · Machine Learning & AI: Advanced Machine Learning (scikit-learn, XGBoost, LightGBM), Deep Learning Frameworks (TensorFlow, PyTorch, Keras), GenAI, NLP & Conversational AI (spaCy, NLTK, transformers), Statistics & Probability, Optimization, Model Deployment (MLOps/AIops). · Data Science Tools: Pandas, NumPy, SciPy, Matplotlib, Seaborn, Plotly, Dask, Apache Spark. · Other Tools: VS Code, PyCharm, Jupyter Notebook, Postman, Docker, Kubernetes. Professional Working Experience Senior Data Engineer Hewlett-Packard Inc (HP), Palo Alto, CA 01/2021 – 10/2024 · Designed and managed ETL pipelines using AWS Glue for automated data extraction, transformation, and job scheduling, ensuring seamless integration of data from various sources. · Ingested and processed real-time streaming data using Amazon Kinesis for immediate analysis, improving decision-making speed and responsiveness. · Used Amazon S3 for scalable, secure storage of both raw and processed data, acting as a staging area for efficient data processing workflows. · Implemented machine learning models with Amazon SageMaker, leveraging frameworks like TensorFlow and PyTorch, and performed hyperparameter tuning to optimize model performance. · Developed serverless real-time data processing workflows using AWS Lambda and orchestrated complex data pipelines with AWS Step Functions. · Created interactive data visualizations and dashboards with AWS QuickSight to monitor and report on machine learning model performance and business outcomes. · Developed and implemented deep learning models for image classification and object detection using convolutional neural networks (CNNs) with frameworks like TensorFlow and PyTorch, achieving high accuracy in identifying complex patterns within large datasets. · Designed a computer vision pipeline for image preprocessing, feature extraction, and data augmentation, which improved model performance and reduced training time on large-scale datasets. · Used Amazon Redshift for high-speed analytical queries and data aggregation, enabling efficient reporting and dashboarding by integrating data from multiple sources. · Implemented MLOps practices by automating model retraining and deployment through AWS CodePipeline, ensuring continuous improvement and real-time updates for deployed models. · Monitored infrastructure and application performance using Amazon CloudWatch, setting up automated alarms to detect and respond to anomalies, optimizing system performance and reliability. Tech Stack: Python(Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn), AWS (Glue, S3, Redshift, Lambda, Sagemaker, Kinesis, Step Functions, Quicksight) Deep Learning (TensorFlow, PyTorch) Data Mining Techniques, NLP (Natural Language Processing), Amazon CouldWatch Data Engineer and Data Analyst Cigna Healthcare Group Bloomfield, CT 12/2018 – 12/2020 · Developed ETL processes to extract, transform, and load data from various sources into a data warehouse, providing actionable insights for business users. · Developed intuitive data visualizations and dynamic dashboards that translated complex datasets into actionable insights, empowering stakeholders to make data-driven decisions quickly and effectively. · Automated data collection and reporting systems, reducing manual data processing time. · Conducted performance tuning on MySQL databases, improving query execution times. · Collected data from multiple sources, including SQL databases, Excel files, Google Analytics, and CRM tools such as Salesforce. Consolidated this information to enable more comprehensive analysis. · Cleaned and prepared datasets using Python and Excel, eliminating duplicates, correcting inconsistencies, and aligning data with company standards for high accuracy. · Developed and maintained ETL processes to load data from various sources into the data warehouse. · Designed and implemented complex logical and physical data models using Erwin Data Modeler and PowerDesigner, ensuring high data integrity and optimizing storage efficiency across the organization. · Identified and resolved bugs in ETL processes for FCCM and ERM applications, enhancing data accuracy and system reliability. · Created interactive dashboards and reports using Power BI to provide business insights. Page 2 · Analyzed business requirements to design data solutions that addressed critical insights, ensuring alignment with strategic goals and improving analytical outputs. · Employed DAX (Data Analysis Expressions) in Power BI for complex calculations and data transformations, enhancing the depth and quality of visualizations. · Designed and implemented new dashboards in OBIEE, providing stakeholders with actionable insights and facilitating data-driven decision-making · Created BI Publisher reports in OBIEE, presenting key metrics and performance indicators in a visually appealing and informative format · Built and maintained data warehouses and data marts using SQL Server and Snowflake. · Provided support for data mapping activities on a Big Data project, ensuring alignment between source data and target data models Tech Stack: ETL/ELT process, Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, MongoDB, Data Modeling (Erwin Data Modeler), Sentiment Analysis, Power BI, Tableau, Text Analytics, SQL, SQLPLUS, PL/SQL, Data Mining, Data Cleaning, Referential Integrity, Custom Visualization Tools, CI/CD, Cluster Upgrade, Snowflake Schemas. Education Masters of Science in Data Science University of South Carolina, Columbia, SC 08/2016 – 05/2018 Projects I have successfully completed over 20 projects across various industries, utilizing my expertise in data engineering, data science, Machine Learning and Data Analytics. This includes a diverse range of freelance and professional projects, with some key highlights mentioned below: Patient Care Optimization Analysis ● Description - Conducted in-depth analysis of patient admission, discharge, and treatment data using SQL and Python to identify bottlenecks and inefficiencies in hospital operations. - Designed and deployed interactive dashboards in Tableau and Power BI to visualize patient flow, readmission rates, and treatment outcomes. - Integrated data from multiple sources, including Electronic Health Records (EHRs), using Azure Data Factory for a unified view of patient care metrics. - Implemented statistical models and predictive analytics to forecast patient admission trends and optimize resource allocation, such as staffing and bed availability. - Created detailed reports and presented actionable insights to healthcare administrators, improving patient wait times and reducing readmission rates. - Ensured compliance with data privacy regulations (e.g., HIPAA) by implementing secure data handling and anonymization practices. Tech Stack - Python - SQL Page 3 - Azure Data Factory EHR Systems Tableau Power BI Customer Data Integration and Insights Platform ● Description - Developed end-to-end ETL pipelines using Python and SQL to ingest data from diverse sources, including CRM systems, transactional databases, and third-party APIs. - Deployed the solution on AWS S3 for storage, AWS Glue for data transformation, and Redshift for data warehousing. - Created optimized data models and schema designs in PostgreSQL to support efficient querying and analytics. - Designed interactive dashboards using Power BI to visualize customer segmentation, lifetime value, and retention metrics. - Ensured data quality and integrity through robust validation processes and automated monitoring scripts in Python. Tech Stack - Python - SQL - AWS (S3, Glue, Redshift) - Data Integration - PostgreSQL - Data Analysis - Power BI How Many Extension (HME) ● Description - This project focuses on Time-Series Analysis using ML techniques to predict key metrics for Amazon data. I’ve implemented the ETL pipeline (with Data Extraction, Data Transformation, and Data Loading layers), and then implemented Machine Learning’s Modeling (Model Training, Testing, Evaluation, and Storage) by employing ARIMA, SARIMA, FbProphet, LSTM, and RandomForest models to forecast Stock, Stock-outs, Offers count, MF, FBA, and Total features. The models are trained locally and then Dockerized for portability. I visualize predictions in a Dash App (Plotly) hosted in another Docker container, and that Docker container has been deployed to the Digital Ocean’s dedicated server. Along with this, I’ve implemented a greedy search algorithm using Python to find the optimal states to get the underlying relation between the seller's and busted stock’s data. Furthermore, the project incorporates Predictive Analytics to anticipate future trends, Customer Intent, and Sentiment analysis to understand consumer behavior. The project aims to assist Amazon sellers with better inventory management and decision-making. Additionally, the trained model is deployed on Digital Ocean's server for easy accessibility and scalability. Tech Stack - Python - Machine Learning - Predictive Analysis/Sentiment Analysis - Deep Learning - ARMIA/SARIMA - FbProphet Page 4 - Docker Dash (Plotly) Flask Digital Ocean Lost Sale Analysis ● Description - Developed a machine learning system to forecast lost sales for JB Hi-Fi and The Good Guys (TGG) by integrating historical sales data and inventory stock levels. - Utilized advanced algorithms to analyze past sales patterns and inventory data to predict potential lost sales scenarios. - Incorporated machine learning models to identify factors contributing to lost sales, such as stockouts or inadequate inventory levels. - Provided actionable insights to optimize inventory management and minimize lost sales, enhancing profitability for JB Hi-Fi and TGG. - Utilized Azure ML Studio to develop and deploy machine learning models for forecasting and analysis. ● Tech Stack - Python - SQL - Machine Learning - Data Integration - Advanced Algorithms - Forecasting Models - Historical Data Analysis - Inventory Management - Azure ML Studio Inventory AI for Supply Chain Management ● ● Description - Developed Inventory AI system for optimizing supply chain management through accurate demand forecasting and efficient inventory balancing. - Created a robust demand forecasting model utilizing cross-price elasticity and NLP to identify substitutes, enhancing demand predictions. - Developed an inventory balancer for inter-store transfers, with an objective function focused on maximizing transfers while minimizing costs. - Implemented the model to improve inventory levels across the supply chain, ensuring the right stock is available at the right location, reducing lost sales. - Optimized transfers to reduce overall costs and ensure efficient inventory distribution, minimizing excess stock and shortages. Tech Stack - Python - Scikit-learn - TensorFlow - NLP Libraries - Cross-Price - Elasticity Analysis - Supply Chain Data - Inventory Optimization Algorithms - Data Preprocessing Techniques - Transfer Cost Minimization Techniques Page 5 RXtrail ● Description - Rxtrail is an ETL Data Pipeline in which I performed Data Collection from AWS S3 AWS RDS and some other open-source heterogeneous sources and stored that in the AWS S3 and then accessed the data in AWS Glue to perform Data Engineering using Python which includes Data Processing, Data Manipulation, and Data Wrangling & Munging. Stored the processed data into AWS Redshift to perform the Data Validation using SQL Queries and Stored Procedures for verification of Data Quality. After verifying the Data Quality, I then stored the processed data into the data warehouse solution which was AWS RDS, and sent an email notification using AWS Lambda. ● Tech Stack - Python - SQL - AWS (S3, Glue, Redshift, RDS, Lambda) - Jupyter - GitHub Custom CABOT Using Generative AI for Home Appliances Manufacturer ● ● Description - Developed a custom conversational AI bot (CABOT) for a home appliances manufacturer, utilizing generative AI and multiple modalities to handle a variety of customer interactions. The CABOT was designed to process audio calls from the customer care center, FAQs, product usage videos, and manage the entire lifecycle of a product from purchase to complaint registration and escalation, ensuring seamless end-to-end automation. - Integrated with the customer care center to handle audio calls, converting speech to text and generating appropriate responses using large language models (LLMs). - Leveraged generative AI to provide accurate answers to frequently asked questions and interpret product usage videos for enhanced customer support. - Enabled customers to register complaints automatically, with the system escalating issues to relevant personnel as needed. - Fine-tuned the Wisher Large V3 model to support Turkish and Urdu languages, catering to a diverse customer base. Tech Stack - Large Language Models (LLMs) - Vector Databases - Azure Vision Services - Azure Functions - ETL of Streaming Data - Azure Blob Storage and Delta Lake - Spark Page 6