James Li
Data Scientist & Machine Learning Engineer
-
q https://www.linkedin.com/in/james-li-6a7b95c e Vancouver, BC
PROFESSIONAL SUMMARY
Seasoned Data Science Professional with 6+ years of strong experience in machine learning engineering, data engineering, and operations. Proficient
in predictive modeling, data processing, data mining, and statistical modeling to address complex business challenges.
Strong expertise in NLP and implementation of deep learning models, such as BERT, GPT, and T5. Familiar with state-of-the-art NLP techniques,
including Transformer architectures, and libraries like spaCy, HuggingFaces.
Proficient in integrating vector stores such as ChromaDB and Pinecone into NLP or machine learning pipelines, enhancing tasks like document
clustering, recommendation systems, and content-based retrieval. Understand APIs and functionalities provided by vector stores.
Excellent knowledge of SQL and RDBMS implementation including data analysis techniques, complex SQL profiling on SQL Server and Teradata, and
optimization of PL/SQL stored procedures and queries. Experience in data extraction, data modeling, data wrangling, and data analysis using IDEs
such as Jupyter Notebook and PyCharm.
Familiar with data warehousing principles and normalization techniques for OLAP systems. Expertise in data mining, text mining, data cleansing,
transformation, and integration, leveraging MySQL, SQL Server, SQL.
Skilled in designing and building batch and stream-based Spark pipelines, semantic search systems, and question-answering systems, deploying to
production environments with high efficiency.
Experienced in data visualization, adept at creating dynamic dashboards, reports, and data stories using Tableau and Power BI, and proficient in
numerous Python and R packages like Pandas, NumPy, SciPy, Matplotlib, Seabon, ggplot2, TensorFlow, and Scikit-Learn.
Solid understanding and background in mathematical foundations behind machine learning algorithms, probability theory, random processes,
statistics, and optimization techniques. Apt in linear algebra and convex optimization techniques.
Proficient with full Data Science project life cycle, actively involved in all phases including data acquisition, data cleaning, data engineering, feature
scaling/engineering, statistical modeling, testing and validation, and data visualization.
Highly experienced in working with cloud-based infrastructure, containerization, and CI/CD pipelines within a DevOps environment. Proven
capability to deliver enterprise-grade, scalable machine learning and deep learning-based applications and services.
Strong proficiency in agile methodology, and SCRUM process, with experience in tracking defects using tools like Jira and Git.
Excellent at collaborating with cross-functional teams, stakeholders, application engineering, quality engineers, and product management.
Outstanding communicative, interpersonal, intuitive, analysis, and leadership skills.
WORKING EXPERIENCE
Data Scientist
Sisense
07/2020 - 07/2023
New York, United States
Sisense is a global business intelligence software company, headquartered in New York City, that offers Sisense Fusion, a cloud-based BI platform enabling users to
connect, analyze, and visualize data from various sources.
• Designed, implemented, and validated over 40 machine learning models across diverse projects, employing real-world program data to predict future trends
accurately.
• Spearheaded Extract, Transform, Load (ETL) processes to streamline data cleaning, modeling, and mining initiatives, enabling report generation through PowerBI,
resulting in a 90% reduction in turnaround time to adhere to SLA standards.
• Utilized SQL to detect and minimize data duplication across various team projects, enhancing coordination, accountability, and streamlining project management.
• Leveraged SAS and Excel Macros to clean and prepare client data, assisting the marketing team in constructing effective marketing mix models. This strategy
yielded an ROI lift of 25 basis points.
• Collaborated with the product team to design and deploy a Python-based product recommendation engine, boosting on-page user engagement and driving
$150K in incremental annual revenue.
• Partnered with product and marketing teams to identify pre-client interactions positively correlated with conversion rates, facilitating strategies that led to a 25%
uplift in conversions.
• Developed operational reports in Tableau to optimize contractor scheduling, leading to an annual budget saving of $85,000.
Data Scientist
Ada Support
10/2017 - 05/2020
Toronto, Canada
Ada Support is a customer service automation platform that leverages LLMs to assist businesses in addressing customer inquiries across various languages and
channels.
• Leveraged Google Cloud Natural Language API and Python for efficient entity extraction, content classification, and sentiment analysis.
• Developed predictive models using a combination of machine learning, natural language, and statistical analysis methods.
• Utilized Spark for distributed data processing on large streaming data sets, improving data ingestion speed by 40%.
• Crafted an automated linear regression model refinement program in SAS, reducing manual work by 20 hours per month, targeted at specific customer base
segments.
• Built ETL infrastructures for data delivery to Redshift, enhancing stakeholder decision-making capabilities by 36%.
• Designed a SQL Server Integration Services (SSIS) package for seamless transfer and loading of files from 150 diverse sources into numerous SL database tables.
• Extracted valuable datasets from Salesforce CRM and loaded them into SQL relational database for analytic processing, utilizing SQL syntax to retrieve,
manipulate, and extract significant results.
• Advisory role in crafting marketing strategies based on efficient marketing media mix channel insights, resulting in reduced marketing expenses, increased clickthrough rate, and boosted customer acquisition rate.
WORKING EXPERIENCE
Data Analyst
GeoViz
06/2016 - 09/2017
Oakville, Canada
GeoViz Inc. is a digital commerce services company helping businesses transform digitally and improve operations for growth and cost reduction.
• Interpreted and analyzed business requirements and processes for clients, leveraging interviews, document analysis, and workflow assessment strategies,
increasing business process understanding by 40%.
• Initiated and executed data extraction from Oracle, SQL, and Teradata for over 20 projects, converting data into SAS data sets using Proc SQL. This process
improvement saved 20% time in data management.
• Participated in the evaluation of dataset complexities for 60 datasets using SAS, leading to a 30% improvement in data quality.
• Skillfully applied strategic logic in formatting and extracting necessary information for over 50 projects, adhering to regulatory standards and reducing datarelated discrepancies by 20%.
• Created impactful visualizations using Tableau for over 100 presentations, improving stakeholder understanding by 50%.
• Demonstrated deep understanding of digital commerce services, focusing on customer requirements to develop over 100 solutions, resulting in a satisfaction
increase of 50%.
EDUCATION
Bachelor of Computer Science
York University
GPA
3.86 / 4.0
e Toronto, Canada
04/2012 - 04/2016
SKILLS
Programming & Tools:
Python
Seaborn
R
Scala
Plotly
SQL
ggplot2
JavaScript
TensorFlow
Microsoft Excel
Tableau
PyTorch
Keras
Power BI
Git
Pandas
Jira
spaCy
Scikit-learn
Jupyter Notebook
Matplotlib
PyCharm
Machine Learning and AI:
Predictive Modeling
Random Forests
Deep Learning
Natural Language Processing
Native Bayes Classifier
Discriminant Analysis
Support Vector Machine
AWS Architecture
OpenAI models
Semantic Search
Ensemble Models
Regression Models
PCA
Decision Trees
Factor Analysis
Cluster Analysis
Huggingface models
Data Engineering:
Data Acquisition
Data Cleaning
Data Warehousing
Data Engineering
Agile Methodology
Features Scaling
Data Visualization
Data Mining
Statistical Modeling
Scrum Process
Cloud and DevOps:
CI/CD Pipelines
Containerization
AWS
Azure
Google Cloud Platform
Firebase
Docker
Database and Big Data Technologies:
SQL Server
HBase
Microsoft Access
MapReduce
MySQL
PostgreSQL
Data Warehousing Principles
MongoDB
Teradata
Apache Spark
OLAP Systems
Business and Soft Skills:
Cross-Functional Teamwork
Communication
Interpersonal Skills
LANGUAGES
English
Proficient
Chinese
Native
Leadership
Problem-Solving
Hadoop
HDFS
Hive