Priyanka Kumari
Curriculum Vitae
Basic Info.
Name
E-Mail
Research Interest
Priyanka Kumari-(1)
(2)
(3)
(4)
(5)
(6)
(7)
Rare Category Analysis
Video analytics
Active Learning
Semi-supervised Learning
Transfer Learning
Spam Filtering
Big data Analytics
Education
2016 - 2019 Master of data Science and Analytics | Massachusetts Institute of
Technology
2005 - 2007 Master of Computer Science (MSCS), Computer science | Carnegie Mellon
2000 - 2004
University
Bachelor of Technology (B.Tech.), Computer engineering | Indian Institute of
Technology (IIT) Bombay
Work Experience
August 2019 - Present
Senior Software Engineer | Google
August 2014 - June 2016
Project Manager | Microsoft
February 2010 - July 2014
Senior Software Engineer | Amazon.com
September 2007 - December 2009
Software Engineer | IBM
Research Experience
◆
Machine Learning
1) Develop a new method for detecting instances from the minority classes via an unsupervised
local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor
process is used to optimize the probability of sampling tightly-grouped minority classes,
subject to a local smoothness assumption of the majority class. The effectiveness of the
proposed method is proven both theoretically and in preliminary experiments.
2) Design a prior-free rare category detection method named SEDER. It implicitly performs
semiparametric density estimation using specially designed exponentially families, and then
picks the examples for labeling where the neighborhood density changes the most.
1
Priyanka Kumari
3)
4)
5)
6)
7)
8)
Experimental results show that its performance is comparable to state-of-the-art techniques
where much more prior information about the data set is needed.
Propose graph-based rare category detection methods named GRADE and GRADE-LI for
detecting minority classes on graphs. They first calculate the global-similarity between two
nodes on the graph, and then implicitly map the nodes to the feature space according to the
global similarity. By sampling in the regions with high density, they have a high probability of
finding examples from the minority class with a few label requests. Given the same amount of
information, GRADE performs much better than state-of-the-art techniques. On the other
hand, given much less information, GRADE-LI performs as well as state-of-the-art
techniques.
Propose a new graph-based transfer learning method. It is based on an objective function that
takes into account the label smoothness on an example-feature-example tripartite graph,
example-example bipartite graph and the consistency with the label information. Furthermore,
to address the computation issue, we propose an iterative algorithm, which is shown to
converge to the optimal value. Experimental results on several data sets demonstrate the
superiority of the proposed method over state-of-the-art techniques.
In the field of spam filtering, propose a new asymmetric boosting method, Boosting with
Different Costs. Compared with traditional boosting methods, which assume the same cost for
misclassified instances from different classes, our method is more generic, and is designed to
be more suitable for problems where the major concern is a low false positive (or negative)
rate. Experimental results on a large scale email spam data set demonstrate the superiority of
our method over state-of-the-art techniques.
Propose a new graph-based semi-supervised learning method. It differs from existing
graph-based methods in that it estimates both the class conditional probabilities and the class
priors, therefore it is a generative model in nature, while existing methods are all
discriminative models. Experimental results on three datasets show the superiority of my
method over existing methods especially when the proportion in the labeled set is not the
same as the class priors.
Propose a new variant of boosting algorithm, named W-Boost, which addresses the problem
of over-fitting when training data is not sufficient to a certain extent. It is based on a novel
weight update scheme and uses changeable bin number to estimate marginal distributions in
weak learner design.
Study and compare existing active learning methods used in Content-based Image Retrieval,
and propose a novel method named mean version space active learning. The criterion of the
proposed method incorporates both posterior probabilities and the size of the version space,
while existing methods are only based on one of them.
2
Priyanka Kumari
◆
Image Related Topics
1) Propose a novel transductive learning framework named manifold-ranking based image
retrieval (MRBIR). Several schemes for incorporating negative feedback images and for
selecting images in each round of relevance feedback are incorporated into the framework. In
systematic experiments, MRBIR outperforms state of the art techniques.
2) Evaluate the performance of different classification algorithms in an image classification task
(photo vs. graphic), e.g. SVM, AdaBoost, Real-AdaBoost, and incorporate the best one
(Real-Adaboost) into a web image search engine developed by Microsoft Research Asia.
3) Propose an optimization-based approach for automatic peak number detection in repeated
pattern analysis. Apply the theory of wallpaper groups to natural images and extract a novel
feature to depict the symmetry property of natural images. The proposed symmetry feature
outperforms several other texture features in image retrieval.
Some of Projects Experience
1. Human Detection in Video stream
Technologies used:
• Faster R-CNN, MASK R-CNN
• Deep Learning, Machine learning, Convolution neural network
• Python, pytorch, JavaScript, ReactJS, NodeJS
• Classical Machine Learning Methods; - SVM classifier with RBF kernel was
trained on hand-crafted LBP (local binary pattern)
• features; - Convolutional NN was used for one of attributes.
• The key application features: - Face Recognition and Tracking; - People
Aggregation; - Storing All the Information into the Database; - Quick and
Advanced View Modes. The product may be used in different video
surveillance fields, such as: - streets; - shopping and business centers; sporting events; - concerts.
Realtime human detection and tracking in live video stream from fisheye camera. It
can also track illegal activities by people and report to police like molesting girls,
carrying arms etc.
The solution has been architected keeping in mind following features:
- Background and foreground segmentation.
- Resolving ambiguities of crossing tracks.
- Re-identification of re-entering humans.
3
Priyanka Kumari
- Server-side solution with HTTP API.
- Integration with Age and Gender classifier from side-view camera.
This project had many modules. The one complex module was person’s features
assessment:
- gender;
- age;
- emotions;
- race.
The system determines percentage of matching for each feature and shows them to
the user.
2. Protection Detection for engineering industry:
Technologies used:
• Faster R-CNN, MASK R-CNN
• Deep Learning, Machine learning, Convolution neural network
• Python, pytorch, JavaScript, ReactJS, NodeJS
The project is designed for assessment of protective clothes and detection:
- a safety helmet;
- safety glasses;
- a reflective vest.
The intuitive indicators YES/ NO identify the presence or absence of the specific/particular
clothes attributes on the person.
The product may be used in different fields, such as: construction; engineering; renovations;
manufacturing.
3. Address parsing using libPostal ( retraining
Libpostal on custom data)
Technologies used:
• Naïve Classifier, Recurrent neural network, Unsupervised learning
• Deep Learning, Machine learning, RBF
4
Priyanka Kumari
• Python, C ++ JavaScript, ReactJS, NodeJS, Micro services, API
This project was to parse address components like Country, zip code, city, building
number, street name etc. from address string.
4. Candidates’ resumes parsing and matching
Technologies used:
• Natural Language Processing, Machine Learning, Data Science
• Deep Learning, Machine learning, Convolution neural network
• Python, pytorch, JavaScript, ReactJS, NodeJS
This project was to build a system that parses resumes and jobs and performs
resume/job matching to find resumes matching specific job or jobs matching specific
resumes. This system takes into account gender, age, relevant backgrounds, including
past job and educational experience. From the technical point of view, for parsing
resumes and jobs the system uses a complex solution based on Apache Tika,
ontologies (for skills, cities, universities and so on) and NLP-based techniques.
For matching, the system uses machine learning based algorithms based on a set of
different information extracted from resumes and jobs systems. The trained model
system ranks custom resumes/jobs and finds top N jobs/resumes that brings together
the most matches and provides a score. The system provides an interface for retraining
matching models based on new or updated resumes and jobs. Also, the system
provides RESTful API interface that allows to use the functionality of the system for
external systems. The results of this project are - a sub-system built into the HR system
- an innovative way to parse resumes and CVs based on Natural Language Processing
and Machine Learning - +25% for customer retention
5. Retail Shop security analytics
Technologies used:
• Faster R-CNN, MASK R-CNN, RBF
• Deep Learning, Machine learning, Convolution neural network
• Python, pytorch, JavaScript, ReactJS, NodeJS
The goal of this project is building a retail analytics system. Within the framework of the
current project, the following tasks are considered.
1) definition of characteristics of the general traffic of visitors including volume and
time of entrance / exit of visitors on the territory of the trading hall;
2) determination of personal characteristics of visitors: maximum / minimum and
5
Priyanka Kumari
average length of stay in the trading floor;
3) determination of the individual characteristics of visitors: the identification of regular
customers (using unique features, for example, a person) and their preferences (client
traffic maps through the trading floor area).
From the technical point of view, we use two video cameras installed inside
the store, the view angle of which allows you to monitor the input / output zones, and
the resolution is sufficient for personal identification (Full HD, or higher) and a server
equipped with a GeForce GTX 1080, the performance of which allows to process the
video stream in real time. We use Ubuntu as main OS and Python with such
frameworks like Caffe, Pytorch, Numpy, Sklearn, Scipy and OpenCV.
All information about visitors generated during processing is stored in the
specified formats (.mp4, .csv, .txt); the values of the main parameters, the input and
output paths can be configured using the console user interface. As a result of the
project, our customer gets a better understanding of marketing and boosted its
revenues by 16%. He also plans to enhance this solution for the whole network of
shops.
6. Drone based Agricultural Intelligence
Technologies used:
• Faster R-CNN, MASK R-CNN, RBF
• Deep Learning, Machine learning, Convolution neural network
• Python, pytorch, JavaScript, ReactJS, NodeJS
Our client provides grape growers with specialized aerial data solutions developed specifically
for the complexities of vineyards. Therefore, the business goal of this project was providing
existing and prospective customers of the company with an easy-to-use and AI-loaded
application. Thus, the customers would be able to fetch and analyze all necessary information
about their vineyards on their own computer using Computer Vision methods in a
cross-platform app.
The goal of the project was to analyze images of grape fields from drones in order to find
and detect grape rows, estimated start and end points, and the length and width of each row.
Additionally, the application supports additional image processing functions such as colors,
brightness and contrast manipulation, drawing primitives (lines, polygons), zoom, saving results
as shapefiles, and supports geolocation information.
From the technical point of view, this application includes cross-platform design written in
C++/Qt for Mac/Win/Linux platforms. The backend of that application relies on Computer Vision
algorithms in order to detect the rows. As a result of this project our client boosted its revenues
by 5% and was able to demonstrate the latest Computer Vision advancements in the field of
agriculture.
6
Priyanka Kumari
7. Android Face App to predict photo of old age
Technologies used:
• CNN, GAN, Faster R-CNN
• Deep Learning, Machine learning, Convolution neural network
• JAVA, Android Application Development
This was android app made using TensorFlow lite framework which can predict your
future face using GAN technologies.
Honors and Awards (Selected)
2009
IBM Fellowship
2008
IBM Fellowship
2004
IIT Samsung Fellowship for excellent student (top 1%)
2004
Best Presentation Award in WSM Group, Microsoft Research Asia
2002
Microsoft bitwise challenge winner
2001
Three Good student (top 1%), IIT, Mumbai
7