TAPS
Microservices based on Machine Learning/Deep Learning and Python.
Recommendation Engine
Brief
A Recommendation engine is a tool that predicts what a user may or may not like from the given
list of items. In our project, we build a recommendation engine for an e-commerce website that
sells auto parts. Based on the user’s event history, it recommends the users the products he may
like to purchase. It is both user based and item-to-item based recommendations meaning that this
project will take account of both the user’s activities and the item’s attributes to generate the
recommendations.
Technical Aspects
Language – Scala
Framework - DeepLearning4J
Solution
To generate the recommendations, we first tracked the user’s activities based on the
products attributes he is searching for. We aggregated all that data in our storage engine
to be later provided as training sets.
We trained wide and deep learning on our training sets that we generated above which
predicts ratings for given products. Using these ratings, we fill the user to product matrix.
Cosine similarity between the products/users are used to generate and rank
recommendations.
Sales prediction based on Time Series data
Brief
Sales predictor is a tool that consists of a recurrent neural network trained on time series data to
predict sales based on the number of items sold in the past. It takes into account of the time as its
3rd dimension.
Technical Aspects
Language – Python
Framework - Keras
Solution
Data is generated by tracking the number of sales per day from our e-commerce portal.
This data is collected into storage engine where it is aggregated on daily basis.
LSTM model is configured in Keras (python) and is trained on the collected datasets. The
trained model is then stored for later use.
Finally, predictions are generated with the help of the trained model for next 30 days.
Density based clustering
Brief
The goal was to find out the region with the maximum number of products sold. Based on the
geo-coordinates of a particular point and the radius i.e. the maximum distance that two points
can have to be included in the cluster is provided to the algorithm and it can then generated the
clusters with varying densities based on the number of products sold.
Technical Aspects
Language – Python
Framework – Scikit-learn
Solution
The radius of the cluster is given to the DBSCAN algorithm which acts as a hyperparameter specifying the maximum distance two points can have to be included into a
single cluster.
The algorithm then runs on some particular points provided as geo-coordinate values.
The distance between two points is calculated using haversine distance.
Number of clusters are generated when DBSCAN algorithm is applied on the training
datasets. Each cluster have some specific number of products sold which acts as its
density.
From the list of clusters, we then select the cluster with maximum density.
Weather Plugin
Brief
Weather plugin is a tool that outputs weather information of a particular location based on it
latitude and longitude values using OpenWeatherMap Rest API. The information is then cached
using Apache Ignite caching services for faster access.
Technical Aspects
Language – Scala
Tools – Apache Ignite
Solution
A plugin is created in scala which uses OpenWeatherMap Rest API to get the information
regarding the weather of a particular location.
The zip code is provided to the user which can be converted into geo-coordinates latitude and longitude values.
Based on these values, we can extract weather information and the resultant is then
cached into Apache Ignite Cache.
Catalog Services
Brief
Prepared catalog data of an e-commerce website using Apache Spark, handled missing
information and uploaded the data into DSE Graph for visualization and analytics. The data is
saved to Cassandra which is then indexed using Solr.
Technical Aspects
Language – Scala
Tools – Apache Spark, DSE Cassandra, DSE Graph, Solr
Solution
Raw data comes from different vendors as text files in different formats. All the raw data
is imported into spark which is then aggregated and transformed into structured form.
The prepared data is the indexed using solr on which we can perform real-time queries
for faster results.
The data is also uploaded into DSE Graph for visualization and analytics purpose.
Shipping Calculator
Brief
Create an application to find the shortest route between user’s location and the warehouse’s location
having enough stock of the requested item. First, the list warehouses having enough stock for that
item is determined. Then user’s zip code along with warehouse’s zip code are converted to
longitude and latitude values. Finally, distance between the user’s location and all warehouse’s
location are calculated using haversine distance and the minimum distant warehouse is determined
and its shipping charges are calculated.
Technical Aspects
Language – Scala
Tools – Akka, Akka-Http, Apache Solr
Solution
A Scala project is created which finds the shortest route between the user’s location and the
warehouse’s location having the appropriate stock of the items that are requested by the
user.
The requests comes into the calculator using rest API which then determine the warehouse
which can fulfill the requests. From the list, we can find out the minimum distance
warehouse based on their geo-coordinates using haversine distance.
We can also find the number of days that are need to deliver the products to the users
location using various providers.
Natural Language Processing API using PySpark and Flask
Brief
The project aimed at creating a Natural Language Processing API for Big Data using PySpark and
Flask. The API was to be called from front-end and the results of NLP transformations like
stemming, lemmatization, stop words removal, tokenization, Document Term Matrix, Sentiment
Analysis, metadata extraction, etc were returned back.
Technical Aspects
Language used : Python
Database : MySql
Tools used : PySpark, Flask
Solution
A MySql database was setup with the text data with several attributes.
The Project was configured with PySpark for Big Data processing.
A Flask API was made to be called by frontend.
How it will work
The columns used for the project were fetched as query string in variables.
A mysql connection was made to fetch the data from mysql database table to a PySpark
dataframe.
Carrying out the corresponding NLP transformation on the passed query_string
Writing the pyspark dataframe to the mysql table
Converting the dataframe to json object.
Returning the json back to the call to API
DBSCAN to boost e-commerce sale
Brief
This project was aimed at boosting the e-commerce product sales. The e-commerce portal contains
several products in various categories and subcategories. The major idea was to find a find the
regions of importance for a particular categories of sales and their intrinsic attributes by using an
unsupervised approach.
Technical Aspects
Language used : Core Java
Dataset : e-commerce sales data
Solution
The e-commerce sales data is analysed and the relevant attributes like Part Number,
Product Type, Categories, Subcategories, addresses, latitudes, longitudes, etc are
extracted from that.
These attributes are fed to DBSCAN to find out the clusters.
Once the clusters are found out, the regions of sales can be figured out and more
campaigns can be carried out for boosting sales.
How it will work
The project involved clustering of object instances for several snapshots of fixed
timestamps.
The object groups that remained together formed a cluster.
The MinPts and Eps are used to determine a cluster.
When the object groups are together for a particular set of snapshots, they are
determined as object groups that remain together.