Prince Kumar | Freelancer Portfolio Item #124591

TAPS Microservices based on Machine Learning/Deep Learning and Python. Recommendation Engine Brief A Recommendation engine is a tool that predicts what a user may or may not like from the given list of items. In our project, we build a recommendation engine for an e-commerce website that sells auto parts. Based on the user’s event history, it recommends the users the products he may like to purchase. It is both user based and item-to-item based recommendations meaning that this project will take account of both the user’s activities and the item’s attributes to generate the recommendations. Technical Aspects   Language – Scala Framework - DeepLearning4J Solution  To generate the recommendations, we first tracked the user’s activities based on the products attributes he is searching for. We aggregated all that data in our storage engine to be later provided as training sets.  We trained wide and deep learning on our training sets that we generated above which predicts ratings for given products. Using these ratings, we fill the user to product matrix.  Cosine similarity between the products/users are used to generate and rank recommendations. Sales prediction based on Time Series data Brief Sales predictor is a tool that consists of a recurrent neural network trained on time series data to predict sales based on the number of items sold in the past. It takes into account of the time as its 3rd dimension. Technical Aspects   Language – Python Framework - Keras Solution  Data is generated by tracking the number of sales per day from our e-commerce portal. This data is collected into storage engine where it is aggregated on daily basis.  LSTM model is configured in Keras (python) and is trained on the collected datasets. The trained model is then stored for later use.  Finally, predictions are generated with the help of the trained model for next 30 days. Density based clustering Brief The goal was to find out the region with the maximum number of products sold. Based on the geo-coordinates of a particular point and the radius i.e. the maximum distance that two points can have to be included in the cluster is provided to the algorithm and it can then generated the clusters with varying densities based on the number of products sold. Technical Aspects   Language – Python Framework – Scikit-learn Solution  The radius of the cluster is given to the DBSCAN algorithm which acts as a hyperparameter specifying the maximum distance two points can have to be included into a single cluster.  The algorithm then runs on some particular points provided as geo-coordinate values. The distance between two points is calculated using haversine distance.  Number of clusters are generated when DBSCAN algorithm is applied on the training datasets. Each cluster have some specific number of products sold which acts as its density.  From the list of clusters, we then select the cluster with maximum density. Weather Plugin Brief Weather plugin is a tool that outputs weather information of a particular location based on it latitude and longitude values using OpenWeatherMap Rest API. The information is then cached using Apache Ignite caching services for faster access. Technical Aspects   Language – Scala Tools – Apache Ignite Solution  A plugin is created in scala which uses OpenWeatherMap Rest API to get the information regarding the weather of a particular location.  The zip code is provided to the user which can be converted into geo-coordinates latitude and longitude values.  Based on these values, we can extract weather information and the resultant is then cached into Apache Ignite Cache. Catalog Services Brief Prepared catalog data of an e-commerce website using Apache Spark, handled missing information and uploaded the data into DSE Graph for visualization and analytics. The data is saved to Cassandra which is then indexed using Solr. Technical Aspects   Language – Scala Tools – Apache Spark, DSE Cassandra, DSE Graph, Solr Solution  Raw data comes from different vendors as text files in different formats. All the raw data is imported into spark which is then aggregated and transformed into structured form.  The prepared data is the indexed using solr on which we can perform real-time queries for faster results.  The data is also uploaded into DSE Graph for visualization and analytics purpose. Shipping Calculator Brief Create an application to find the shortest route between user’s location and the warehouse’s location having enough stock of the requested item. First, the list warehouses having enough stock for that item is determined. Then user’s zip code along with warehouse’s zip code are converted to longitude and latitude values. Finally, distance between the user’s location and all warehouse’s location are calculated using haversine distance and the minimum distant warehouse is determined and its shipping charges are calculated. Technical Aspects   Language – Scala Tools – Akka, Akka-Http, Apache Solr Solution  A Scala project is created which finds the shortest route between the user’s location and the warehouse’s location having the appropriate stock of the items that are requested by the user.  The requests comes into the calculator using rest API which then determine the warehouse which can fulfill the requests. From the list, we can find out the minimum distance warehouse based on their geo-coordinates using haversine distance.  We can also find the number of days that are need to deliver the products to the users location using various providers. Natural Language Processing API using PySpark and Flask Brief The project aimed at creating a Natural Language Processing API for Big Data using PySpark and Flask. The API was to be called from front-end and the results of NLP transformations like stemming, lemmatization, stop words removal, tokenization, Document Term Matrix, Sentiment Analysis, metadata extraction, etc were returned back. Technical Aspects    Language used : Python Database : MySql Tools used : PySpark, Flask Solution    A MySql database was setup with the text data with several attributes. The Project was configured with PySpark for Big Data processing. A Flask API was made to be called by frontend. How it will work       The columns used for the project were fetched as query string in variables. A mysql connection was made to fetch the data from mysql database table to a PySpark dataframe. Carrying out the corresponding NLP transformation on the passed query_string Writing the pyspark dataframe to the mysql table Converting the dataframe to json object. Returning the json back to the call to API DBSCAN to boost e-commerce sale Brief This project was aimed at boosting the e-commerce product sales. The e-commerce portal contains several products in various categories and subcategories. The major idea was to find a find the regions of importance for a particular categories of sales and their intrinsic attributes by using an unsupervised approach. Technical Aspects   Language used : Core Java Dataset : e-commerce sales data Solution    The e-commerce sales data is analysed and the relevant attributes like Part Number, Product Type, Categories, Subcategories, addresses, latitudes, longitudes, etc are extracted from that. These attributes are fed to DBSCAN to find out the clusters. Once the clusters are found out, the regions of sales can be figured out and more campaigns can be carried out for boosting sales. How it will work     The project involved clustering of object instances for several snapshots of fixed timestamps. The object groups that remained together formed a cluster. The MinPts and Eps are used to determine a cluster. When the object groups are together for a particular set of snapshots, they are determined as object groups that remain together.