Syed Waqas Faheem-House # 1-C Block Market Model Town, Lahore, Pakistan
PROFESSIONAL SUMMARY
Experienced Data Engineer and Solution Architect with a robust background in Software Engineering,
specializing in designing and implementing scalable, optimized data pipelines and solutions on cloud.
Proficient in leveraging AWS services such as AWS Lambda, AWS Glue, AWS Athena, AWS Redshift,
and AWS DynamoDB to manage and process complex and large-scale data. Adept at utilizing
distributed computing principles to build resilient and efficient data infrastructure. Strong
problem-solving skills with a focus on data as a core asset, ensuring robust and optimized data
solutions. Proven ability to transition from software development to data engineering, combining best
practices from both fields to deliver high-quality results.
SKILLS
Data Engineer:
* Apache Spark, DuckDB, Presto, Serverless, Data Mesh, Lakehouse, Warehouse
* AWS (Airflow, S3, Lambda, SNS, SQS, Lambda, Glue, Athena, EMR, Redshift,
CloudFormation, Lake Formation)
* GCP (GCS, Cloud Function, Bigtable, Big Query, DataProc)
* Tableau, PowerBI
* Iceberg
Software Engineer:
* Language (JAVA, Scala, Python, Bash Script, SQL)
* Database (MySQL, Postgres, Cassandra, Single store, MongoDB, Mongo Atlas, Redis, Big
Query, Redshift)
* Queue/Stream (ActiveMQ, RabbitMQ, Apache Kafka)
* DevOps (Docker, Docker Compose, Jenkins, Kubernetes, Helm Charts)
* Monitoring/Support (Elasticsearch, Logstash, Kibana, Grafana, PagerDuty)
* Tool/Framework (Spring, Spring Boot, Spring Security, Spring OAuth, Spring Cloud, Spring
JWT, Spring Data JPA, Hibernate, Apache Airflow, Apache Spark, Git Flow)
WORK EXPERIENCE
Data Engineer
2022 - Present
BeyonMoney (FinTech) • UAE
Objective: To design and implement a cost-efficient, scalable data engineering solution for a
financial application from scratch. Collecting all stakeholders and third parties in a unified Data
Warehouse where they can break silos and make data driven decisions.
Responsibilities:
●
Cost Optimization: Achieved 50x cost savings by utilizing serverless technologies and
●
●
●
●
●
●
●
●
●
●
●
queues over traditional frameworks.
Data Pipeline Development: Built the entire data engineering project from scratch, including
batch pipelines to ingest financial application data into an AWS S3 data lake.
Scalable Processing: Implemented scalable pipeline processing using AWS SNS, SQS,
and Lambda.
Change Data Capture (CDC): Implemented CDC for PostgreSQL, DynamoDB instances
used by financial applications.
Lakehouse Architecture: Developed a lakehouse architecture leveraging Apache Iceberg
and AWS.
Data Mesh: Implemented a Data Mesh architecture using AWS LakeFormation to manage
decentralized data ownership and governance.
Monitoring and Alerting: Created custom monitoring and alerting solutions using AWS
CloudWatch Insights, with notifications routed to Microsoft Teams for real-time visibility.
PowerBI Integration: Integrated PowerBI with the lakehouse architecture using AWS Glue,
AWS Iceberg, and AWS Athena for advanced financial analytics and reporting.
Stakeholder Communication: Maintained direct communication with stakeholders to
continuously improve data quality and meet business requirements.
Team Management: Led the team in end-to-end design and architecture development,
ensuring alignment with fintech industry standards and regulations.
Data Management: Developed processes for data collection, schema validation,
segregation, and management across different environments (Dev, Stage, Prod).
Collaboration: Worked closely with DevOps, Data Science, and Machine Learning teams to
ensure seamless integration and data accessibility for financial modeling and analysis.
Technologies Used: AWS S3, AWS SNS, AWS SQS, AWS Lambda, PostgreSQL, Apache Iceberg,
AWS LakeFormation, AWS Glue, AWS Athena, AWS CloudWatch, PowerBI, Microsoft Teams.
Impact:
●
●
●
●
Built a scalable and cost-effective data infrastructure tailored to fintech needs.
Improved data quality and processing efficiency, enhancing financial data insights.
Enabled advanced analytics and real-time monitoring, critical for fintech operations.
Fostered effective collaboration across multiple teams and stakeholders, driving innovation
and data-driven decision-making.
Data Engineer
2022 - 2024
IBM - The Weather Company • Remote
Objective: To gather and process user activity data for advertising purposes using AI/ML.
Responsibilities:
●
●
●
Legacy Migration: Managed/Support and migrated legacy on-prem Java-based
asynchronous code to the cloud (AWS).
Data Pipeline Design: Created and optimized end-to-end data pipelines on AWS, reducing
costs significantly.
Data Flow: Ingested large volumes of data from mParticle to our DataLake (MSK -> SNS ->
SQS -> Lambda -> S3).
●
●
●
●
●
●
●
●
Lakehouse Architecture: Implemented and fine-tuned raw to staging zone processing using
Lambda, Athena, Glue, and Iceberg.
Data Warehousing: Transformed and ingested data into Redshift for further analysis and
AI/ML use (Iceberg -> Redshift).
Data Integration: Managed over 50 data integration projects involving cross-functional
teams.
Efficiency Improvements: Improved data pipeline efficiency by 35%, resulting in faster
business insights.
Data Management: Oversaw data pipeline infrastructure handling over 100TB of data daily.
Data Quality: Ensured data quality and integrity through automated validation processes,
boosting accuracy by 20%.
GDPR Compliance: Implemented ingestion, transformation, compaction, and GDPR
compliance measures.
Collaboration: Worked closely with data scientists, analysts, and developers to understand
and meet data requirements.
Technologies Used: Apache Spark, AWS Lambda, MSK, SNS, SQS, S3, Lambda, AWS Athena,
Apache Iceberg, Redshift, Glue, mParticle, Lakeformation.
Impact:
●
●
●
●
Streamlined data processing and optimized operations for cost-effectiveness.
Enabled real-time analytics and data-driven decision-making for advertising strategies.
Accelerated AL/AI and Data Science Teams with fast and optimum processing of data.
Supported GDPR compliance and ensured secure and compliant data handling.
Data Engineer
Aug 2020 - Sep 2021
Vodworks • Lahore, Pakistan
As a Data Engineer and Solution Architect, I contributed to a project with the Thailand
Government in collaboration with a telecom company to analyze telecom user activities during the
COVID-19 pandemic. The objective was to track user movements and gatherings across regions
to predict potential COVID-19 outbreaks. This involved designing and deploying robust data
pipelines for real-time analytics and geospatial analysis, enabling authorities to make data-driven
decisions to mitigate the spread of the virus.
Trace pulse:
Objective: To track telecom user activities across regions and identify potential COVID-19
outbreak hotspots.
●
Responsibilities:
○ Data Collection: Integrated data from telecom systems, including user location and
movement data.
○ Data Processing: Developed ETL pipelines to process and clean large datasets
using BigQuery and Apache Spark.
○ Geospatial Analysis: Utilized GIS tools and GCP services to perform geospatial
analysis, identifying regions with high user movements and gatherings.
○ Real-time Analytics: Implemented real-time data streaming and processing using
○
○
○
GCS, Google Cloud Functions, Pub/Sub and BigQuery.
Reporting and Visualization: Created dashboards and reports using different
visualization tools to present insights to government officials. Google Studio and
Tableau.
Collaboration: Worked closely with government stakeholders to understand
requirements and deliver actionable insights.
Optimization: Ensured data pipelines were optimized for performance and
scalability, handling large volumes of data efficiently.
Impact
●
●
●
●
●
Previous 12 hours working in another warehouse took just 2 to 3 hours.
Sole contribution for this entire project saving human resources and cost to the organisation.
Enabled the Thailand Government to monitor and predict potential COVID-19 outbreaks by
analyzing telecom user movements.
Provided critical insights that informed decision-making for public health interventions and
resource allocation.
Contributed to the overall efforts in managing and mitigating the spread of COVID-19 within
the region.
Geo Data:
Objective: To monitor and display real-time transaction data across various domains and regions.
Responsibilities:
●
●
●
Data Ingestion: Collected transaction data from multiple domains and regions, processing it
every second.
Real-time Analytics: Developed a real-time dashboard to display overall transaction
statistics for each region.
Data Processing: Implemented big data solutions to handle the high-frequency, high-volume
transaction data.
Technologies Used: Real-time data processing tools (e.g., Apache Kafka, Spark Streaming),
dashboarding solutions (e.g., Grafana, Tableau).
Impact:
●
●
*
Enabled real-time visibility into regional transaction activities, enhancing monitoring and
decision-making.
Improved the ability to respond quickly to transaction trends and anomalies across regions.
Principal Software Engineer
Perception IT • Lahore
OpsAlerts
2019 - 2020
Objective: To create a comprehensive solution for managing end-to-end operational alerts in IT and
network infrastructure.
Responsibilities:
●
●
●
●
●
Real-time Data Collection: Designed and implemented systems to collect real-time alerts
and information from various sources across IT and network infrastructure.
Automated Issue Resolution: Developed automation solutions to resolve general and
recurring issues, significantly reducing the need for human intervention.
Data Collectors and Query Builder: Created diverse data collectors to aggregate
information from multiple sources, and developed a query builder to set thresholds for
alerting.
Dashboard Development: Built a real-time dashboard to monitor all alerts and activities
across the infrastructure, enabling proactive issue detection and resolution.
User Requirements and Specifications: Collected and documented users' requirements,
and developed both logical and physical specifications to meet operational needs.
Technologies Used: Spring Boot, Spring Cloud, Spring Data JPA, Hibernate, Microservices,
Real-time data processing tools, automation frameworks, dashboarding tools (e.g., Grafana,
Kibana), various data collection and integration technologies.
Impact:
●
●
●
●
Provided a state-of-the-art, one-stop solution for managing operational alerts, enhancing the
efficiency of IT and network operations.
Improved the ability to detect and address operational issues promptly, minimizing downtime
and service interruptions.
Enabled automated issue resolution, reducing the workload on IT staff and increasing
operational efficiency.
Facilitated proactive monitoring and management through a real-time, comprehensive
dashboard.
Principal Software Engineer
Global Engineering Services • Lahore, Pakistan
Nov 2018 - Aug 2019
Point of Sale
* Product developed for abstraction of Point of Sale which can be used by almost any business
* Developed and sold it for American restaurant chains with salient features of inventory
management, employee management and scheduling, order, sales and payments via our own
developed gateways
* Actively took part in design and implementation of the entire requirement.
* Responsible for all phases, from development to staging to production handling all the CI/CD
as well
* Worked on legacy application and upgraded to microservices with continuous integration and
deployment
Senior Software Engineer
Five Rivers Technology • Lahore, Pakistan
Apr 2015 - Nov 2018
Virtual Desktop Infrastructure
* A product supporting integration of hypervisors, hots agents, thin clients and remote desktop
connection for clients
* Custom solution providing ease of virtualisation by removing high level settings and setup
burdens for end consumer with just few clicks of our application
* Supporting both VMWare and XEN hypervisors.
Software Engineer
Eager soft, Inc • Lahore, Pakistan
Sep 2012 - Apr 2015
American Health System
* Assisted in design and implementation of a client management solution
* Released features including doctor appointment, medication alerts, health checks, customized
diet plans and online collaboration of doctor/patient
* Developed both frontend as well as backed systems
* Maintained the system and wrote new features with core layer and business logic
* Performed automation testing as well as load testing for the application before releasing to
production
* Worked on automating the American Hospital application
EDUCATION
Government College University Lahore | bachelor’s in computer sciences | 2008 – 2012