Mahmoud TRIGUI « Data Scientist »
Street El Ain Km 6, Environment Boulevard
3042 Sfax – TUNISIA
Mobile phone TN : (-
E-mail-Self-tape https://youtu.be/OUljBuMxoXU
LinkedIn www.linkedin.com/in/mahmoudtrigui
Skill IQ https://app.pluralsight.com/profile/mahmoud-trigui
Acclaim www.youracclaim.com/users/mahmoud-trigui/badges
Zindi https://zindi.africa/users/Mahmoud_Trigui/competitions
Personality https://www.16personalities.com/profiles/46e44008a0245
Titled with a statistics and data analysis engineering degree with a solid problem
solving thanks to my mathematics-physics background, having +9 years of
experience, passionate about anything related to machine learning, loving dealing
with structured data, exploring data-mining insights for our daily issues tabular data,
finding and creating new features through detailed analysis and approaching projects
with different perspective. I use primarily R language but I'm always open to add more
Key Words
Analysis, ANOVA Analysis, Business, Clustering Relationship, Construction, Churn, CVM,
CRISP-DM ,Data storytelling, Dataiku, Decision rules, Dimensionality Reduction,
Embedding, Feature engineering, Forecasting, Geospace Analysis, GPT-4, Hypothesis, IBM
Watson, Image conversion, LLM, Methodologies, MLForecast, Network analysis, NLP,
Oracle DB, Presentation, Profiling, Pydantic, R Markdown, SAS-GUIDE, SAS E-MINER,
Sampling Techniques, Shiny App, Segmentation, Spatial Data, Transformations, Time
Series Decomposition, Tutoring, VBA
Contents
Data Scientist @ Sofrecom Tunisia (part of Orange Group).......................................................................................... 3
PDF Enfo Extraction with LLM ........................................................................................................................................................... 3
Time Series Forecasting for Business Demand ................................................................................................................................ 3
CoFinancing Prediction for FTTH Deployment ............................................................................................................................. 4
Data Scientist @ Kiota Intelligence ................................................................................................................................. 4
Shiny App .................................................................................................................................................................................................... 5
RMarkdown Report................................................................................................................................................................................. 5
Exploratory Data Analysis ..................................................................................................................................................................... 7
Matching Script ......................................................................................................................................................................................... 8
Clustering Relationship .......................................................................................................................................................................... 8
Data Scientist/Analyst @ Tunisie Telecom ....................................................................................................................... 9
Client Segmentation ............................................................................................................................................................................... 10
Multi-Sim Detection .............................................................................................................................................................................. 13
Data-Mining Process .............................................................................................................................................................................. 14
Family Communities .............................................................................................................................................................................. 15
Ad-hoc Analysis / Reports ................................................................................................................................................................... 15
Data Science Projects @ Freelance .................................................................................................................................. 18
Facebook Page Analysis ........................................................................................................................................................................ 18
Vehicle Weight Estimation.................................................................................................................................................................. 18
Data Science Tutor.................................................................................................................................................................................. 19
Research Statistician @ LAAS-CNRS ............................................................................................................................. 22
Alarm Automatic management ...........................................................................................................................................................22
Machine Learning Competitor @ Zindi ............................................................................................................................. 24
Spatial Data .............................................................................................................................................................................................. 24
Various ........................................................................................................................................................................................................26
Assessments, Courses & Skills .......................................................................................................................................... 28
Pluralsight .................................................................................................................................................................................................28
LinkedIn .....................................................................................................................................................................................................28
Acclaim .......................................................................................................................................................................................................29
DataCamp ..................................................................................................................................................................................................29
Coursera .....................................................................................................................................................................................................30
edX ...............................................................................................................................................................................................................30
Stanford Online ....................................................................................................................................................................................... 31
MIT ProX, Udacity & DataQuest ...................................................................................................................................................... 32
+ Personality ...................................................................................................................................................................... 34
Data Scientist @ Sofrecom Tunisia (part of
Orange Group)
PDF Info Extraction with LLM
We developed a PDF Audit Information Extraction System using GPT-4 that automatically processes multipage audit reports. The system converts PDFs to images rather than text because some information are
already on logos and graphics, it also search a specific items in a pre-defined reference dictionary.
Built with a modular Python architecture, it features custom Pydantic validation models, optimized
image conversion, and Base64 encoding for LLM compatibility.
The solution generates structured JSON outputs while maintaining data integrity through comprehensive
validation mechanisms, significantly reducing manual processing time for audit professionals.
Time Series Forecasting for Business Demand
The goal is to generate 12-month & 53-week demand predictions at multiple business levels (department &
offer). The system addresses the challenge of limited historical data through advanced feature engineering,
incorporating holiday calendars, weekend patterns, and seasonal decomposition techniques.
My implementation leverages R's specialized forecasting packages (feasts, fable) to create an ensemble
of models including ARIMA, ETS, Holt-Winters, TBATS, Prophet and Deep learning. The solution achieves
approximately 5% of error.
Key technical components include time series decomposition (STL, AutoSTR) to separate trend and
seasonality patterns.
CoFinancing Prediction for FTTH Deployment
The goal is to predict co-financing rates for FTTH. The project involved migrating Python scripts to
production-grade Dataiku workflows, prioritizing SQL and native components for performance
optimization, and designing an efficient Analytics Base Table creation process.
For model enhancement, we validated MLForecast function effectiveness, tested additional predictive
variables, and evaluated alternative modeling approaches including other time series models and
transforming the problem into a regression task.
The part of feature engineering did improved the model, including creating interactions terms and doing the
necessary transformations to have similar distributions across different period time.
Data Scientist @ Kiota Intelligence
My main work was to create the desired output for our endpoints with plumber, all type of task in data
science with R included. All data were investment and funding rounds related.
Shiny App
The goal was to provide a tool that allow clients to have statistics and graphs about their data-entry based on
many filters.
R Markdown Report
Generate a complete psychometric report about entrepreneurs and teams
EDA
The goal is to have a descriptive statistics and a test hypotheses between different stages in proportion.
Matching Script
The goal of this project was to connect the right investors to the right entrepreneurs based on many filters.
Clustering Relationship
The goal of this project was to find any potential relation between different variables and to connect their
centroids to see the evolution over time.
G
Data Scientist/Analyst @ Tunisia Telecom
Group
I had the opportunity to work with big amount of data, millions of rows including hundreds of variables
describing clients through different source and tables.
A large amount of time for the data-mining project was dedicated to the ABT construction via:
Data reliability-check
Data gathering
Data cleaning
So, in order to accomplish these tasks, I need to have a provident coding in SQL for Oracle (Toad), in addition
to the basic programming of SAS and its functionalities in SAS Guide.
Client Segmentation
The business needed a new segmentation for our biggest part, the prepaid client base, to use it in theirs
targeting campaigns, in the CVM and in various reports requests.
The initial phase was the data preparation along with the schedule and the specification & mapping
document.
Phase 2 was the important one: trying different segmentation methods, different methodologies
(one client per row or one client per period). Work done with SAS E-MINER.
Primordial step to validate our clustering result is the detailed profiling, which will be affected by the
business strategies and goals of the department.
Phase 3 was for the supervised task: once the clusters are finally reviewed and validated, we build a model to
assign every customer of each new month to its segment.
Here we took a different approach in order to make it easier for the non-technical audience and for the
directors, not by applying simply a classification model but to try to set easy decision rules extracted
from the previous decision tree-based one, and then with the help of Tableau or Excel, we fit the final groups.
Multi-Sim Detection
We had to do several analytics to build initial hypothesis for our model since we do not have any record
of multi-sim users in our databases, the idea was to proceed with clustering but with different approach (the
goal to identify even small groups or who had a very practical behavior).
With a detailed profiling, we were able to distinguish the most suspicious group, validate this by doing a callcenter campaign, then build a predictive model on top of that data (with sampling techniques).
CRISP-DM
I was integrated partially in the churn project among other ones, but every data-mining process is
like:
Family Communities
Our Community Link Analysis system provides us with only the groups and their roles, but with no further
information about the relations between clients of each community, thus the idea to find which community
are representing one family and living under the same roof, the key data here was the Cell_ID tables, along
with a hard work on Toad for feature engineering (used the ones generated by the CLA such as:
betweenness, hub score ..), to finally use the appropriate network analysis package from R for the
final touch.
Ad-hoc Analysis / Reports
Various demand from different department came to us to solve, to investigate and to find a patterns from our
sources in the final goal to make the data storytelling.
Understand the business behind the demand, create the right SQL request, organize the extracted data
with Excel, and at some points go further with PowerPoint for presentations.
VBA
Data Science Projects @ Freelance
I had the opportunity to work on various topics and connect to different teams with the agile methodology.
Facebook Page Analysis
The main idea was to do an overall analysis for one particular page which we have its credentials, to look at its
numbers insights and to do some text mining in order to get sentiment analysis.
Vehicle Weight Estimation
To optimize its logistic line, a company want to predict the mass of vehicles on a connected weighing system
with a maximum error less or equal to 5%.
Data Tutor Tutor
I had my chance to be a tutor and introduce a course about Machine Learning: algorithms and examples for
a class of 5 persons.
I also had another training session where I animated about the use of machine learning with IBM
Watson for a class of 20 persons.
Research Statistician @ LAAS-CNRS
Alarm Automatic management
The idea is to develop an automatic model to know from the owner's habits, if it's the owner who tries to
enter or it's an intrusion based on the owner behavior, and then decide to launch the alarm or to deactivate it
automatically.
Machine Learning Competitor
I have been competing for over 18 months & over 14 challenges and I’m not intending to stop any way for a
simple reason: I just keep learning new things every day, new algorithm, new way of thinking, new approach,
new trick, new package, new optimization.
I gained so much knowledge through this period, because you need to be always up to date to latest news for
the data science community or you will find yourself left behind and just repeating what you have gained!
Spatial Data
OSM & Roads
Shape & Raster files, Projections
Geospace Analysis
Various
Clustering
Dimensionality Reduction
ANOVA Analysis
Assessments, Courses & Skills
Pluralsight
LinkedIn
Acclaim
DataCamp
Coursera
edX
Stanford Online
MIT ProX, Udacity & DataQuest
English Certificate
+ Personality