Software engineering and consultancy in Java, Python, Javascript, development of distributed and cloud computing.
I deploy, maintain and develop big data processing pipelines ingesting streaming data through Kafka or using batching with Spark and distributed storage AWS S3 or Google Cloud Storage, processing that data using Spark jobs (used to write that jobs in Scala) that data is stored in Apache Druid distributed warehouse for quering purposes. Apache Druid allows to do SQL queries and do system caching of that data in memory using memory mapped files that allows it to return data much for the same time interval. Data is distributed across a cluster using hash partitioning that avoids distribution skewness and overloading a machines unevenly.