Yash Datta

Yash Datta

$30/hr
Senior backend engineer, Data engineering
Reply rate:
-
Availability:
Part-time (20 hrs/wk)
Age:
38 years old
Location:
Singapore, Singapore, Singapore
Experience:
13 years
Yash Datta SOFTWARE DEVELOPER · BUILDER MAKER 31, Fernvale Road, #08-58, High Park Residences, 797417, Singapore  (- | -|  saucam |  ydatta “Everything that exists emanates from the lord (Srimad Bhagvatam 1.1.1)” Education Columbia University Remote M.S. IN COMPUTER SCIENCE Jan. 2021 - Present • Pursuing a Master’s in Computer Science from SEAS school, Columbia University NSIT, Delhi University Delhi, India B.E. IN INSTRUMENTATION AND CONTROL Apr. 2005 - May. 2009 • Graduated cum laude, with Overall Percentage of 78.6%. Apeejay School, Pitampura Delhi, India ALL INDIA SENIOR SCHOOL CERTIFICATE EXAMINATION 2003 - 2005 • Passed the AISSCE CBSE, with Overall Percentage of 91%., March 2005 • Passed the All India Secondary School Certificate Examination, CBSE, with Overall Percentage of 89.6%, March 2003. Skills DevOps Programming Data Project Management AWS, Docker, Kubernetes, Jenkins, Prometheus, Grafana Scala, Java, C++, Python Apache Spark, Apache Parquet, Elasticsearch, Kafka, Hive, Hadoop, JanusGraph, Postgres, HBase, Cassandra Git, PlantUML, Markdown, Jira, Confluence Experience Bank of America Merrill Lynch Singapore SENIOR SOFTWARE ENGINEER III Sep. 2022 - Present • Working on creating data transformation and consumption pipelines using apache camel framework. • Worked on adding several features to Jet, which is a post trade processing and reconciliation service within GMOT division of BoFa. The service uses an event-sourcing framework called plasma and amps messaging queue for managing the events. • Created an automated regression testing framework for testing complex flows of an event sourcing based system. Primary challenge was to add module for sending and receiving amps messages via gatling DSL. Later all flows were ported to the framework by the team, thereby automating a long standing manual task. • Collated and worked on several tech debt tasks to improve stability and time taken of Jet build pipeline. Was able to completely eliminate all flaky integration tests by fixing nested async calls to poll for test results (messages). Scala Gradle JUNE 2, 2023 SPARQL AMPS Event-Sourcing Apache-Camel Cucumber YASH DATTA · CURRICULUM VITAE Gatling 1 Standard Chartered Singapore DATA ENGINEER Dec. 2019 - Sep. 2022 • Developed a small command-line utility to generate fake indonesia national ids called ektps for testing nexus onboarding. The utility overlays passport size photo of the user over a template and generates and adds random id numbers in the fake image. • Developed a helm chart for productionising trino for nexus’s adhoc data analytic workloads. Trino was configured to use hive metastore service, which was again deployed using a helm chart developed for nexus by me. The spark jobs were configured to write to hive tables. The trino workers could auto-scale via horizontal pod autoscaler depending on query workloads submitted. • Developed a mock data service called dmok to generate mock user data for testing of nexus microservices. One of the primary use case was generating auth data in keycloak for users to faciliate onboarding. Used the Zio framework to create a purely functional codebase for the same. • Developed several spark jobs to compute user accounts and transactions data for nexus. The jobs were scheduled via airflow and run within the k8s cluster. • Developed a python library for common re-usable functions required to generate and run for airflow DAGs for all the different spark jobs. We could generate airflow DAGs based on yaml file specifications that allowed us to specify the DAG structure in yaml format. We could also specify the cluster resources for the jobs that each of these DAGs would spawn. • Led the testing automation effort within nexus team at SCB. Worked on developing an automated integration and regression testing framework for all microservices within nexus in a k8s cluster. Also helped in establishing standards and practices for development and testing cycles, evidencing and tracking, mocking / stubbing of requests etc. Apart from functional testing, also handled load and performance testing of the complete nexus system. • Established Wiremock as a framework for mocking external services within nexus, thereby making it easier to test the nexus system. • Created a functional and load testing framework called juggernaut, wrapping gatling library, used to test all the different microservices within the Nexus ecosystem at SCB. The tool is written in scala, using gradle as the build tool, and is run via Jenkins. It is highly configurable and uses data from json/csv files to fire requests to different services, then pushes the gatling stats to Elasticsearch for easy dashboarding over kibana. The framework has been extended to match request / response against expected data making it possible to write functional tests as well. It also generates a summary report of all the simulations run along with the customary pass / fail information. • Designed a centralized logging system (Elasticsearch, Fluentd, FluentBit, S3) which gathers the log data from k8s pods running the microservices. Apache-Spark Airflow Elasticsearch Microservices Kubernetes Docker Hive Trino Gradle Fluentd Wiremock Gatling Cucumber Scala Java Zio HSF-CERN (Google Summer of Code) Singapore DEVELOPER INTERN Jun 2020 - Aug. 2020 • Architected and developed a big data solution to load telescope alert data into Janusgraph at scale as part of Google Summer of Code project. The solution consists of a Spark job that can read the alerts data, generate edges based on vertex classifier algorithms and load the vertex and edges into Janusgraph, an open source graph database. • Further details can be found here Apache-Spark JanusGraph HBase Elasticsearch Scala ZIO Mesos Apache Parquet Algorithms Rakuten Asia Singapore SENIOR SOFTWARE ENGINEER Sep. 2017 - Dec. 2019 • Led a team of around 7 members to create and maintain large scale, distributed, advertisement data and delivery systems within Rakuten Core Platform group. • Added several common utility libraries that are re-usable components across different projects. It includes a git config loading and caching utility, a scala wrapper over caffeine cache, a Prometheus metrics library for scala, etc. • Introduced Prometheus as the tool of choice for monitoring web-apis within Rakuten MPD. Helped in deploying as well as creating a scala wrapper for easily integrating Prometheus into any scala code base. • Contributed significantly in architecting and developing an ad tracking solution for generating analytical reports that can measure ad performance. The solution involved ingesting large amounts of data in real-time from Kafka using Spark-Streaming and writing the transformed data into HDFS. As part of this project, also developed standard ETL flows for rule-based fraud/filter detection using Spark (later conversion reports are generated using hive queries). Avro, Parquet, and JSON data formats are being used, with schema registry as the schema management tool. • Introduced a long running ”Kaizen” sprint for several system improvements including deployment automation. • Architected and led the development of Behavioral Targeting Advertisement system within MPD. It was recently deployed to production, with no major issues being reported. This was a complex, large-scale system that involved interfacing with multiple external systems, communicating with all stakeholders, breaking down steps into actionable tasks, and overcoming multiple technical challenges. As part of this project, deployed an ML model for targeted ads for users and developed a scoring service ”Easel” from scratch on top of this pipeline. • Developed a large scale ETL project ”Curator” for processing 8 TB of data in about 1 hour 40 minutes using Apache Spark. • Developed a project that scales with data and processes and stores it to elasticsearch using Spark. (5X latency improvement with 2X more data than existing system). Scala Microservices Kubernetes Docker JUNE 2, 2023 Apache Spark Apache Parquet Kafka Apache Zookeeper Apache Avro Elasticsearch YASH DATTA · CURRICULUM VITAE Aerospike Couchbase Prometheus Grafana 2 Agoda Bangkok, Thailand SENIOR SOFTWARE ENGINEER Jun. 2016 - Sep. 2017 • Architected and created a new city search API to search all available hotels on a per city basis. Established Elasticsearch as the ideal choice for this use case, building a proof of concept for the same. Reduced the latency of the flow by up to 3X ( 250 - 300 ms from almost 900 ms) • Created a generic framework for handling all the different filters along with any complex combinations among them using AND/OR/NOT. • Worked on creating and maintaining (including indexing) elastic search cluster for the property search API across 5 DCs. • Developed a highly configurable general purpose Akka-Http based rest client, used for calling multiple different APIs in the Agoda ecosystem. • Developed a distributed sync service to sync data from mysql servers to Elasticsearch using Kafka. Scala Microservices Cassandra Elasticsearch Kafka MySQL Apache Zookeeper Consul Guavus Network Systems Gurgaon, Haryana, India LEAD TECHNOLOGY Mar. 2015 - Jun. 2016 • Made several contributions to the Apache Spark project (versions 1.1, 1.2, 1.3, 1.4, 1.5, 1.6). • Optimized very low latency Spark queries for Acume Cache, a caching layer built on top of Spark. Acume is able to serve (indexed data on subscriber id) time series and aggregate queries in less than 500 ms. Optimized for a load of 25 queries per second.. • Worked on the optimization of specific queries in Spark-Sql (1.2) over parquet format. Already contributed code to both Spark-Sql and Parquetmr projects, bringing in improvements up to 40% for certain cases. • Integrated Impala and Shark into Guavus platform (CentOS box). Added support for running Shark server and connecting to it via beeline. • Added a custom storage handler for Infinidb (a columnar datastore) in Hive. The functionality allows to store data from a hive table to an external table which is stored in infinidb. Also, queries that use data from native hive tables as well as external tables can be run. • Developed a new backup and restore strategy for infinidb. • Added several features to Insta, the efficient data storage and retrieval service (structured live big data). Some of the added features included bin replay functionality, where data from past timestamps need to be persisted, incrementally, optimization of large aggregation queries with same tuple list etc. • Wrote several data generation and testing scripts for QE. • Developed Query Engine, a forensic query and analysis service. QE uses HDFS and Infinidb as storage sections. MR jobs are run to pull records from HDFS into Infinidb after annotations. Later, re-structured Query Engine to handle generic cases by adding a Java service to spawn MapReduce jobs for record filtering based on key columns defined in a configuration XML. • Automated Infinidb Installation via Tall Maple CLI (Tall Maple is a custom linux kernel). C++ Boost Postgres Scala Java Apache Spark Apache Parquet Hive Hadoop MapReduce Aristocrat Technologies Infinidb HBase Impala Noida, Uttar Pradesh, India SOFTWARE ENGINEER, IDC STUDIO 9 Apr. 2011 - Dec. 2011 • Worked as a Casino Games developer for multiple markets (Australia, New Zealand, Nevada). Project work included development and porting of reel games, bug fixing, and ensuring the adherence of game behavior as a whole to international as well as market-specific standards. C++ Boost Gemalto Pte Ltd. Singapore SOFTWARE ENGINEER, R & D CENTER Jul. 2009 - Apr. 2011 • Developed applications for OS for Native SIM cards, in C using Samsung Calmshine16 V2 compiler. Also worked on code size and RAM optimization. • Developed test scripts for exhaustive testing of applications running on the OS. Also ported test scripts from native to .NET platform. • Performed hardware testing of the product as a whole, to ensure it conforms to the GSM standards for voltage, current consumption, and noise. VB .NET C Microccontroller programming Test Automation Hardware Testing Extracurricular Activity PC Building Delhi, Singapore, Thailand HOBBY Apr. 2011 - Present • Created several PC builds for home/personal use. • The latest build uses RTX 4090 and intel core i9 13900K :) Rakuten Table Tennis Tournament 2017, 2018, 2019 WINNER Singapore Sep. 2017 - Oct. 2019 • Won Rakuten Table Tennis Tournament 3 years in a row. Gemalto R&D Day Singapore ORGANIZING COMMITTEE 2010 • Was part of the organizing committee for Gemalto R&D Day, a fun outdoor activity/games day for the Research and Development group. JUNE 2, 2023 YASH DATTA · CURRICULUM VITAE 3 Honors & Awards 2019 Development Project Award for ’Centralized Monitoring Platform’, Rakuten GATD Award, Rakuten Asia 2018 Impact Beyond the Group Award, Rakuten Quarterly Recognition Awards, Rakuten Asia 2008 Certificate of Appreciation, Centre for Electronics Development and Technology, Electronics Division, NSIT Delhi, India 2008 Certificate of Excellence, ”Game On” event, Tryst’08, All India Intercollege Technical Festival, IIT Delhi Delhi, India 2005 Certificate of Academic Merit for Computer Science, CBSE Senior Board Examination, Apeejay School Delhi, India 2004 Principal’s Special Award: Best in the field of Computer Science, Apeejay School Delhi, India Singapore Singapore Technical Workshops & Presentations Technical Workshop at Rakuten Asia Singapore PRESENTER FOR ”INTRODUCTION TO APACHE SPARK” Nov. 2019 • Conducted a hands-on technical workshop on introduction to apache-spark for data processing use-cases, including spark APIs, spark basic concepts like shuffle, how data is distributed, spark streaming etc. Rakuten Tech Talks Singapore PRESENTER FOR ”MONITORING SOLUTIONS” Sep. 2018 • Presented a successful proposal to adopt Prometheus and related technologies to solve monitoring problems within the Rakuten ecosystem. Rakuten Tech Talks, Rakuten Asia Singapore PRESENTER FOR ”INTRODUCTION TO FUNCTIONAL PROGRAMMING” Sep. 2018 • Introduced different teams to functional programming principles. Spark Summit East New York, USA CONFERENCE Mar. 2015 • It was great to exchange ideas with developers from various organizations actively involved in big data analytics. Guavus Tech Workshop, Guavus Network Systems PRESENTER FOR ”INSTA” Haryana, India Oct. 2012 • Conducted a hands on training for Insta installation, configuration, and troubleshooting. • Insta is the data storage and retrieval service that I helped develop at Guavus. Advanced C and Embedded Systems, NTU Singapore TRAINING WORKSHOP 2010 • Attended a 3-day training session on Advanced C and Embedded Systems, organized by Nanyang Technical University as part of the Gemalto internal training schedule. JUNE 2, 2023 YASH DATTA · CURRICULUM VITAE 4
Get your freelancer profile up and running. View the step by step guide to set up a freelancer profile so you can land your dream job.