Yash Datta
SOFTWARE DEVELOPER · BUILDER MAKER
31, Fernvale Road, #08-58, High Park Residences, 797417, Singapore
(-
|
-| saucam
| ydatta
“Everything that exists emanates from the lord (Srimad Bhagvatam 1.1.1)”
Education
Columbia University
Remote
M.S. IN COMPUTER SCIENCE
Jan. 2021 - Present
• Pursuing a Master’s in Computer Science from SEAS school, Columbia University
NSIT, Delhi University
Delhi, India
B.E. IN INSTRUMENTATION AND CONTROL
Apr. 2005 - May. 2009
• Graduated cum laude, with Overall Percentage of 78.6%.
Apeejay School, Pitampura
Delhi, India
ALL INDIA SENIOR SCHOOL CERTIFICATE EXAMINATION
2003 - 2005
• Passed the AISSCE CBSE, with Overall Percentage of 91%., March 2005
• Passed the All India Secondary School Certificate Examination, CBSE, with Overall Percentage of 89.6%, March 2003.
Skills
DevOps
Programming
Data
Project Management
AWS, Docker, Kubernetes, Jenkins, Prometheus, Grafana
Scala, Java, C++, Python
Apache Spark, Apache Parquet, Elasticsearch, Kafka, Hive, Hadoop, JanusGraph, Postgres, HBase, Cassandra
Git, PlantUML, Markdown, Jira, Confluence
Experience
Bank of America Merrill Lynch
Singapore
SENIOR SOFTWARE ENGINEER III
Sep. 2022 - Present
• Working on creating data transformation and consumption pipelines using apache camel framework.
• Worked on adding several features to Jet, which is a post trade processing and reconciliation service within GMOT division of BoFa. The service
uses an event-sourcing framework called plasma and amps messaging queue for managing the events.
• Created an automated regression testing framework for testing complex flows of an event sourcing based system. Primary challenge was to add
module for sending and receiving amps messages via gatling DSL. Later all flows were ported to the framework by the team, thereby automating
a long standing manual task.
• Collated and worked on several tech debt tasks to improve stability and time taken of Jet build pipeline. Was able to completely eliminate all
flaky integration tests by fixing nested async calls to poll for test results (messages).
Scala
Gradle
JUNE 2, 2023
SPARQL
AMPS
Event-Sourcing
Apache-Camel
Cucumber
YASH DATTA · CURRICULUM VITAE
Gatling
1
Standard Chartered
Singapore
DATA ENGINEER
Dec. 2019 - Sep. 2022
• Developed a small command-line utility to generate fake indonesia national ids called ektps for testing nexus onboarding. The utility overlays
passport size photo of the user over a template and generates and adds random id numbers in the fake image.
• Developed a helm chart for productionising trino for nexus’s adhoc data analytic workloads. Trino was configured to use hive metastore service,
which was again deployed using a helm chart developed for nexus by me. The spark jobs were configured to write to hive tables. The trino
workers could auto-scale via horizontal pod autoscaler depending on query workloads submitted.
• Developed a mock data service called dmok to generate mock user data for testing of nexus microservices. One of the primary use case was
generating auth data in keycloak for users to faciliate onboarding. Used the Zio framework to create a purely functional codebase for the same.
• Developed several spark jobs to compute user accounts and transactions data for nexus. The jobs were scheduled via airflow and run within
the k8s cluster.
• Developed a python library for common re-usable functions required to generate and run for airflow DAGs for all the different spark jobs. We
could generate airflow DAGs based on yaml file specifications that allowed us to specify the DAG structure in yaml format. We could also specify
the cluster resources for the jobs that each of these DAGs would spawn.
• Led the testing automation effort within nexus team at SCB. Worked on developing an automated integration and regression testing framework
for all microservices within nexus in a k8s cluster. Also helped in establishing standards and practices for development and testing cycles,
evidencing and tracking, mocking / stubbing of requests etc. Apart from functional testing, also handled load and performance testing of the
complete nexus system.
• Established Wiremock as a framework for mocking external services within nexus, thereby making it easier to test the nexus system.
• Created a functional and load testing framework called juggernaut, wrapping gatling library, used to test all the different microservices within
the Nexus ecosystem at SCB. The tool is written in scala, using gradle as the build tool, and is run via Jenkins. It is highly configurable and uses
data from json/csv files to fire requests to different services, then pushes the gatling stats to Elasticsearch for easy dashboarding over kibana.
The framework has been extended to match request / response against expected data making it possible to write functional tests as well. It
also generates a summary report of all the simulations run along with the customary pass / fail information.
• Designed a centralized logging system (Elasticsearch, Fluentd, FluentBit, S3) which gathers the log data from k8s pods running the microservices.
Apache-Spark Airflow Elasticsearch Microservices
Kubernetes Docker Hive Trino Gradle
Fluentd
Wiremock
Gatling
Cucumber
Scala
Java
Zio
HSF-CERN (Google Summer of Code)
Singapore
DEVELOPER INTERN
Jun 2020 - Aug. 2020
• Architected and developed a big data solution to load telescope alert data into Janusgraph at scale as part of Google Summer of Code project.
The solution consists of a Spark job that can read the alerts data, generate edges based on vertex classifier algorithms and load the vertex and
edges into Janusgraph, an open source graph database.
• Further details can be found here
Apache-Spark
JanusGraph
HBase
Elasticsearch
Scala
ZIO
Mesos
Apache Parquet
Algorithms
Rakuten Asia
Singapore
SENIOR SOFTWARE ENGINEER
Sep. 2017 - Dec. 2019
• Led a team of around 7 members to create and maintain large scale, distributed, advertisement data and delivery systems within Rakuten Core
Platform group.
• Added several common utility libraries that are re-usable components across different projects. It includes a git config loading and caching
utility, a scala wrapper over caffeine cache, a Prometheus metrics library for scala, etc.
• Introduced Prometheus as the tool of choice for monitoring web-apis within Rakuten MPD. Helped in deploying as well as creating a scala
wrapper for easily integrating Prometheus into any scala code base.
• Contributed significantly in architecting and developing an ad tracking solution for generating analytical reports that can measure ad performance. The solution involved ingesting large amounts of data in real-time from Kafka using Spark-Streaming and writing the transformed data
into HDFS. As part of this project, also developed standard ETL flows for rule-based fraud/filter detection using Spark (later conversion reports
are generated using hive queries). Avro, Parquet, and JSON data formats are being used, with schema registry as the schema management
tool.
• Introduced a long running ”Kaizen” sprint for several system improvements including deployment automation.
• Architected and led the development of Behavioral Targeting Advertisement system within MPD. It was recently deployed to production, with
no major issues being reported. This was a complex, large-scale system that involved interfacing with multiple external systems, communicating with all stakeholders, breaking down steps into actionable tasks, and overcoming multiple technical challenges. As part of this project,
deployed an ML model for targeted ads for users and developed a scoring service ”Easel” from scratch on top of this pipeline.
• Developed a large scale ETL project ”Curator” for processing 8 TB of data in about 1 hour 40 minutes using Apache Spark.
• Developed a project that scales with data and processes and stores it to elasticsearch using Spark. (5X latency improvement with 2X more data
than existing system).
Scala Microservices
Kubernetes Docker
JUNE 2, 2023
Apache Spark Apache Parquet Kafka
Apache Zookeeper Apache Avro
Elasticsearch
YASH DATTA · CURRICULUM VITAE
Aerospike
Couchbase
Prometheus
Grafana
2
Agoda
Bangkok, Thailand
SENIOR SOFTWARE ENGINEER
Jun. 2016 - Sep. 2017
• Architected and created a new city search API to search all available hotels on a per city basis. Established Elasticsearch as the ideal choice for
this use case, building a proof of concept for the same. Reduced the latency of the flow by up to 3X ( 250 - 300 ms from almost 900 ms)
• Created a generic framework for handling all the different filters along with any complex combinations among them using AND/OR/NOT.
• Worked on creating and maintaining (including indexing) elastic search cluster for the property search API across 5 DCs.
• Developed a highly configurable general purpose Akka-Http based rest client, used for calling multiple different APIs in the Agoda ecosystem.
• Developed a distributed sync service to sync data from mysql servers to Elasticsearch using Kafka.
Scala
Microservices
Cassandra
Elasticsearch
Kafka
MySQL
Apache Zookeeper
Consul
Guavus Network Systems
Gurgaon, Haryana, India
LEAD TECHNOLOGY
Mar. 2015 - Jun. 2016
• Made several contributions to the Apache Spark project (versions 1.1, 1.2, 1.3, 1.4, 1.5, 1.6).
• Optimized very low latency Spark queries for Acume Cache, a caching layer built on top of Spark. Acume is able to serve (indexed data on
subscriber id) time series and aggregate queries in less than 500 ms. Optimized for a load of 25 queries per second..
• Worked on the optimization of specific queries in Spark-Sql (1.2) over parquet format. Already contributed code to both Spark-Sql and Parquetmr projects, bringing in improvements up to 40% for certain cases.
• Integrated Impala and Shark into Guavus platform (CentOS box). Added support for running Shark server and connecting to it via beeline.
• Added a custom storage handler for Infinidb (a columnar datastore) in Hive. The functionality allows to store data from a hive table to an
external table which is stored in infinidb. Also, queries that use data from native hive tables as well as external tables can be run.
• Developed a new backup and restore strategy for infinidb.
• Added several features to Insta, the efficient data storage and retrieval service (structured live big data). Some of the added features included
bin replay functionality, where data from past timestamps need to be persisted, incrementally, optimization of large aggregation queries with
same tuple list etc.
• Wrote several data generation and testing scripts for QE.
• Developed Query Engine, a forensic query and analysis service. QE uses HDFS and Infinidb as storage sections. MR jobs are run to pull records
from HDFS into Infinidb after annotations. Later, re-structured Query Engine to handle generic cases by adding a Java service to spawn MapReduce jobs for record filtering based on key columns defined in a configuration XML.
• Automated Infinidb Installation via Tall Maple CLI (Tall Maple is a custom linux kernel).
C++ Boost
Postgres
Scala
Java
Apache Spark
Apache Parquet
Hive
Hadoop
MapReduce
Aristocrat Technologies
Infinidb
HBase
Impala
Noida, Uttar Pradesh, India
SOFTWARE ENGINEER, IDC STUDIO 9
Apr. 2011 - Dec. 2011
• Worked as a Casino Games developer for multiple markets (Australia, New Zealand, Nevada). Project work included development and porting
of reel games, bug fixing, and ensuring the adherence of game behavior as a whole to international as well as market-specific standards.
C++
Boost
Gemalto Pte Ltd.
Singapore
SOFTWARE ENGINEER, R & D CENTER
Jul. 2009 - Apr. 2011
• Developed applications for OS for Native SIM cards, in C using Samsung Calmshine16 V2 compiler. Also worked on code size and RAM optimization.
• Developed test scripts for exhaustive testing of applications running on the OS. Also ported test scripts from native to .NET platform.
• Performed hardware testing of the product as a whole, to ensure it conforms to the GSM standards for voltage, current consumption, and noise.
VB .NET
C
Microccontroller programming
Test Automation
Hardware Testing
Extracurricular Activity
PC Building
Delhi, Singapore, Thailand
HOBBY
Apr. 2011 - Present
• Created several PC builds for home/personal use.
• The latest build uses RTX 4090 and intel core i9 13900K :)
Rakuten Table Tennis Tournament 2017, 2018, 2019
WINNER
Singapore
Sep. 2017 - Oct. 2019
• Won Rakuten Table Tennis Tournament 3 years in a row.
Gemalto R&D Day
Singapore
ORGANIZING COMMITTEE
2010
• Was part of the organizing committee for Gemalto R&D Day, a fun outdoor activity/games day for the Research and Development group.
JUNE 2, 2023
YASH DATTA · CURRICULUM VITAE
3
Honors & Awards
2019
Development Project Award for ’Centralized Monitoring Platform’, Rakuten GATD Award, Rakuten Asia
2018
Impact Beyond the Group Award, Rakuten Quarterly Recognition Awards, Rakuten Asia
2008
Certificate of Appreciation, Centre for Electronics Development and Technology, Electronics Division, NSIT
Delhi, India
2008
Certificate of Excellence, ”Game On” event, Tryst’08, All India Intercollege Technical Festival, IIT Delhi
Delhi, India
2005
Certificate of Academic Merit for Computer Science, CBSE Senior Board Examination, Apeejay School
Delhi, India
2004
Principal’s Special Award: Best in the field of Computer Science, Apeejay School
Delhi, India
Singapore
Singapore
Technical Workshops & Presentations
Technical Workshop at Rakuten Asia
Singapore
PRESENTER FOR ”INTRODUCTION TO APACHE SPARK”
Nov. 2019
• Conducted a hands-on technical workshop on introduction to apache-spark for data processing use-cases, including spark APIs, spark basic
concepts like shuffle, how data is distributed, spark streaming etc.
Rakuten Tech Talks
Singapore
PRESENTER FOR ”MONITORING SOLUTIONS”
Sep. 2018
• Presented a successful proposal to adopt Prometheus and related technologies to solve monitoring problems within the Rakuten ecosystem.
Rakuten Tech Talks, Rakuten Asia
Singapore
PRESENTER FOR ”INTRODUCTION TO FUNCTIONAL PROGRAMMING”
Sep. 2018
• Introduced different teams to functional programming principles.
Spark Summit East
New York, USA
CONFERENCE
Mar. 2015
• It was great to exchange ideas with developers from various organizations actively involved in big data analytics.
Guavus Tech Workshop, Guavus Network Systems
PRESENTER FOR ”INSTA”
Haryana, India
Oct. 2012
• Conducted a hands on training for Insta installation, configuration, and troubleshooting.
• Insta is the data storage and retrieval service that I helped develop at Guavus.
Advanced C and Embedded Systems, NTU
Singapore
TRAINING WORKSHOP
2010
• Attended a 3-day training session on Advanced C and Embedded Systems, organized by Nanyang Technical University as part of the Gemalto
internal training schedule.
JUNE 2, 2023
YASH DATTA · CURRICULUM VITAE
4