Curriculum Vitae
Personal information
Name:
Address:
Mobile:
E-mail:
Citizenship:
Date of birth:
Home page:
LinkedIn:
Google Scholar:
Tomasz Bujlow
Leipzig, Germany
--European Union, Polish
August 05, 1984
http://tomasz.bujlow.com
http://www.linkedin.com/in/tomaszbujlow
http://scholar.google.com/citations?user=WvFturoAAAAJ
Summary
Currently, I am working on the development of Network Intelligence (NI) software solutions, which involve traffic classification,
analysis, and complete decoding of detected protocols and applications. These solutions are characterized by high performance for core network links with speeds up to 100 Gbit/s and faster. They use various technologies (e.g., Deep Packet
Inspection, behavioral, heuristic, and statistical analysis) to reliably detect network protocols, applications, and services, and
extract metadata, in real time. I am a daily user of G Suite, Atlassian Software (JIRA, Confluence) and GIT. From the time
of taking this position, I am working with Agile development methodologies including SCRUM. During that time, I identified
many aspects of SCRUM that are critical from the quality and development productivity points of view. I also served as a
customer support developer channel, which made me able to better understand how the customers see and use our software
and what are their priorities for the product development and maintenance. As sharing the technical knowledge is my passion,
I organized multiple training workshops related to computer networks and network traffic analysis.
I obtained my PhD in Classification and Analysis of Computer Network Traffic from Aalborg University in Denmark on June 6,
2014. My PhD project was co-financed and co-supervised by Bredbånd Nord A/S, a regional electricity and Internet provider.
Due to this industrial collaboration, I learned how to collect and understand customer requirements, present high-level concepts
and results to the company management, and structure the work in order to reach both the scientific and industrial goals on
time.
I was the founder and developer of nDPIng - the next generation open-source computer network traffic classification tool,
which aims in consistent real-time traffic identification on multiple levels: transport layer protocol, all application level protocols,
type of content, service provider, and content provider. I was also the principal investigator in the Volunteer-Based System for
Research on the Internet project, which was focused on designing and developing a system, which is able to provide detail
data about applications used in the Internet. This information can be used for obtaining the knowledge which applications
are most frequently used in the network, providing the users some basic statistics about their Internet connection usage (for
example, for which kinds of applications their connection is used the most), creating scientific profiles of traffic generated by
different applications or different groups of applications.
I used to work independently and cover the entire development process, from architecture, design, implementation and customer feedback up to bug fixing. Apart from the nDPIng and Volunteer-Based System for Research on the Internet projects,
I fully authored 2 industrial projects. Web-Based Client for InDesign Server uses web-based techniques and tools in collaboration with a headless version of InDesign Server, which is controlled by scripts produced by the designed web interface, to
render InDesign documents in the real-time. The Efficient Invoicing Solution with Offline Synchronization Capabilities project
was concentrated on creating an invoicing system for a mining company, which is characterized by a significant fraction of
features differing from other systems already existing in the market. The designed and implemented system was in use in
around 30 departments of OPA-LABOR during 4 years, successfully satisfying all the requirements set in this project.
I am quick in learning new technologies (e.g., programming languages, development platforms and frameworks) and using the
new knowledge and skills in practice, which allows me to easily switch between different IT-related fields. I put the ability to
solve problems with the help of the Internet, books, or other people above the encyclopedic knowledge (e.g., knowing by heart
the syntax of a particular programming language or an already existing and documented algorithm).
1
During my PhD, I was a visiting researcher at Universitat Politècnica de Catalunya (UPC) in Barcelona, Spain, where I was
working together with the Broadband Communications Research Group on the comparison of Deep Packet Inspection Tools
for traffic classification. I was also visiting ntop in Pisa, Italy (collaboration on nDPI) and TELECOM Sudparis in Evry, France
(collaboration on traffic classification in 802.11). I am an author of 4 journal articles, 8 conference papers, and 3 technical reports on the topics related to traffic monitoring and analysis. Two of my papers got awards as top 7 % and top 5 %, respectively.
Since 2011, I gave 11 presentations in seminars and guest lectures at Aalborg University in Denmark, TELECOM Sudparis in
France, University of Pisa in Italy, Polytechnic University of Turin in Italy, RWTH University in Germany, Universitat Politècnica
de Catalunya in Spain, IDA House of Engineers in Denmark, and Albena Resort in Bulgaria. I am a reviewer of 15 articles
submitted to different journals and conferences.
During my postdoctoral research, I investigated techniques used for tracking users’ activity online. Many content providers
and online retailers collect large amounts of personal information from their users when browsing the web. The large scale
collection and analysis of personal information constitutes the core business of most of these companies, which use this
information for lucrative purposes, such as online advertising and price discrimination. However, most mechanisms used to
track users and collect personal information are not well known or intentionally obfuscated. The main objective was to uncover
these mechanisms and understand how they collect, analyze, store and (possibly) sell this information.
I am also a holder of 2 language certificates: TOEFL iBT (98/120) and Prøve i Dansk 3 (9/12).
Does that sound interesting? If yes, you are welcome to contact me, as, currently, I am looking for new job opportunities
worldwide! I am open to almost any form of employment - I can work as a full-time company employee as well as a contracted
project-based consultant. However, I would like to be able to work at least half of the time remotely from home (in a way
according to the needs of both the company and me).
Work experience
-
9
Position
Employer
Senior Developer DPI
Founder and Developer of nDPIng
Postdoctoral Investigator
PhD Student
Visiting PhD Student
PHP Software Developer, Project Leader
Home Delivery Assistant
Wireless Network Specialist (internship)
C++ Software Developer (internship)
ipoque GmbH
Own open-source project
Universitat Politècnica de Catalunya
Aalborg University
Universitat Politècnica de Catalunya
cbit / Imento
Morgendistribution Danmark
Proximetry Poland
OPA-LABOR
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Phone
Fax
E-mail
Details
Start date
End date
06/2015
03/2014
10/2014
12/2010
01/2013
08/2008
11/2007
07/2007
04/2007
—
04/2015
03/2015
12/2013
04/2013
09/2010
01/2008
08/2007
08/2007
June 2015 – Present
Senior Developer DPI
Development of Deep Packet Inspection (DPI) / Network Intelligence (NI) solutions:
R&S R Protocol and Application Decoding Engine (PADE), R&S R PACE 2 (learn
more: https://ipoque.com/products/pace).
ipoque GmbH
Augustusplatz 9, 04109 Leipzig, Germany
-
--R&S R PACE 2 is the next generation software library that identifies thousands of
protocols, applications, and services, and provides deeper insight into application
attributes (e.g., real-time performance metrics). R&S R PACE 2 combines the power
of the Protocol and Application Classification Engine (PACE) and decoding engine
(PADE), and is also capable of advanced metadata extraction. This solution is characterized by high performance for core network links with speeds up to 100 Gbit/s
and faster. It uses various technologies (e.g., Deep Packet Inspection, behavioral,
heuristic, and statistical analysis) to reliably detect network protocols, applications,
and services, and extract metadata, in real time. Key performance indicators are
calculated for deeper insight.
2
The decoding results of R&S R PACE 2 provide the deepest information about the
current connection. R&S R PACE 2 extracts all important and relevant metadata
from a number of network classification results with a configurable level of detail to
suit different use cases. For example, it is possible to decompress HTTP payload
and reconstruct all images or videos from internet sites. The depth of information
required can be flexibly adjusted to provide just the actual data needed. Internal aggregators gather decoding information from certain decoders and bundle them into
classes. For example, even if an email connection takes a long time, the full session
decoding information still provides all of the data in one single place. The decoding
feature of R&S R PACE 2 is especially useful in network security applications, e.g,
the playback of VoIP calls, websites and chat sessions, or gathering upload and
download statistics of various documents.
8
7
Period
Occupation or position held
Activities and responsibilities
Type of activity
Initial development location
Status
Accessible (SVN)
Details
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Phone
Fax
E-mail
Details
March 2014 – April 2015
Founder and Developer of nDPIng
Development of the next generation computer network traffic classification tool
Own open-source project
Computer Science Department, University of Pisa, Pisa, Italy
Development stage
https://svn.ntop.org/svn/ntop/trunk/nDPIng/
The aim of this unique project is to bring new quality to the field of traffic classification by providing the results on many levels. The clear, unambiguous identification of
network flows is meant to be ensured by various classification techniques combined
into a single tool. The following information is intended to be given for each flow
inspected by the classifier: transport layer protocol, all the application-layer protocols, type of the content, service provider, and content provider. Look at the Projects
section for a detailed description.
October 2014 – March 2015
Postdoctoral Investigator
Research on online users’ privacy
Universitat Politècnica de Catalunya, Department of Computer Architecture, Broadband Communications Research Group
Jordi Girona, 1–3, 08034 Barcelona, Spain
-
--It is widely known that content providers and online retailers (e.g., Google, Facebook
and Amazon) collect large amounts of personal information from their users when
browsing the web. The large scale collection and analysis of personal information
constitutes the core business of most of these companies, which use this information
for lucrative purposes, such as online advertising and price discrimination. However,
most mechanisms used to track users and collect personal information are still unknown. Our main objective is to uncover these mechanisms and understand how
they collect, analyze, store and (possibly) sell this information.
Personal information in the web can be voluntarily given by the user (e.g., by filling
web forms) or it can be collected indirectly without their explicit knowledge through
the analysis of the IP headers, HTTP requests, queries in search engines, or even by
using JavaScript and Flash programs embedded in web pages. Among the collected
data, we can find information of technical nature (e.g., the browser in use) and also
more sensible information (e.g., the geographical location or the visited web pages).
The webmail services are also known for scanning and processing user’s e-mails,
even if they are received from a user who did not allow any kind of message inspection. In order to track their users, online services use various methods. The most
popular techniques are the use of different kinds of browser cookies, fingerprinting
the user in the background, or suggesting (or requiring) the user to fill in a profile, so
the web identity can be further extended by associating it with the real user’s identity.
3
We investigate whether the services are using other, unexpected mechanisms to
track user activity, as if the network of contacts of a user and their interests are used
to build the profile of the users, and what impact it has on their privacy. We also
analyze if online services collect information when users are logged off to a service
using cookies or user fingerprints and later combine this information with their online
profiles when they log in. We investigate the ability of web services to follow the
users’ activity in the private browsing mode and analyze special privacy-focused
search engines. We test their capabilities and compare them with the standard
search engines. In another front, we investigate the impact of user tracking in price
discrimination. Product pricing can be based on the geographical location of the
user but also on the user profiles sold by online services.
6
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Phone
Fax
E-mail
Details
5
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Phone
Fax
E-mail
December 2010 – December 2013
PhD Student
Classification and analysis of computer network traffic
Aalborg University, Department of Electronic Systems, Networking and Security
Section
Fredrik Bajers Vej 7, 9220 Aalborg Øst, Denmark
-
--Our objective: to evaluate the performance of various applications in a high-speed
Internet infrastructure.
1. We performed substantial testing of widely used DPI classifiers (PACE, OpenDPI,
L7-filter, NDPI, Libprotoident, and NBAR) and assessed their usefulness in generating ground-truth, which can be used as training data for Machine Learning Algorithms (MLAs).
2. Because the existing methods (DPI, port-based, statistical) were shown to not be
sufficient, we built our own host-based system (VBS) for collecting and labeling of
network data. The packets are grouped into flows, which are labeled by the process
name obtained from the system sockets. Look at the Projects section for a detailed
description.
3. We assessed the usefulness of C5.0 MLA in the classification of computer network traffic. We showed that the application-layer payload is not needed to train
the C5.0 classifier, defined the sets of classification attributes and tested various
classification modes.
4. We showed how to use our VBS tool to obtain per-flow, per-application, and
per-content statistics of traffic in computer networks. Furthermore, we created two
datasets composed of various applications, which can be used to assess the accuracy of different traffic classification tools. The datasets contain full packet payloads
and they are available to the research community as a set of PCAP files and their
per-flow description in the corresponding text files.
5. We designed and implemented our own system for multilevel traffic classification,
which provides consistent results on all of the 6 levels: Ethernet, IP protocol, application, behavior, content, and service provider. The system is able to deal with
unknown traffic, leaving it unclassified on all the levels, instead of assigning the traffic to the most fitting class. Our system was implemented in Java and released as
an open-source project.
6. Finally, we created a method for assessing the Quality of Service in computer
networks.
January 2013 – April 2013
Visiting PhD Student
Comparison of Deep Packet Inspection tools for traffic classification
Universitat Politècnica de Catalunya, Department of Computer Architecture, Broadband Communications Research Group
Jordi Girona, 1–3, 08034 Barcelona, Spain
-
--
4
4
Details
The outcomes were thoroughly described in a technical report Comparison of Deep
Packet Inspection (DPI) Tools for Traffic Classification, which is shown below in the
Publications section.
1. We created a dataset of 10 different applications (eDonkey, BitTorrent, FTP, DNS,
NTP, RDP, NETBIOS, SSH, HTTP, RTMP), which is available to the research community. It contains- flows captured during 66 days. The dataset is available
as a bunch of PCAP files containing full flows including the packet payload, together
with corresponding text files, which describe the flows by providing all the necessary
details, including the corresponding application name, start, and end timestamps
based on the system sockets.
2. We tested the accuracy of several Deep Packet Inspection tools (PACE, OpenDPI,
L7-filter, NDPI, Libprotoident, and NBAR) on our dataset. To test NBAR, we needed
to replay the packets to the Cisco router and process the Flexible NetFlow logs. The
other tools were tested directly as libraries by a special software, which was reading
packets from the PCAP files and providing the packets to the classifiers.
Period
Occupation or position held
Activities and responsibilities
August 2008 – September 2010
PHP Software Developer, Project Leader
Development of Imento product. My own project: Web-Based Client for InDesign
Server (look at the Projects section for a detailed description)
cbit / Imento
Cikorievej 20A, 5220 Odense SØ, Denmark
--
Name of employer
Address of employer
Phone
E-mail
3
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Contact
2
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Phone
Fax
E-mail
1
Period
Occupation or position held
Activities and responsibilities
Name of employer
Address of employer
Phone
Fax
E-mail
November 2007 – January 2008
Home Delivery Assistant
Providing delivery of products to customers or designated locations dispatched from
the Central Delivery Depot
Morgendistribution Danmark
Fjordsgade 11, 1. sal., 5000 Odense C, Denmark
none known – bankruptcy
July 2007 – August 2007
Wireless Network Specialist (internship)
Designing and developing Quality of Service measurement software for Wireless
Local Area Networks, WiMax testing
Proximetry Poland
Roździeńskiego 91, 40-203 Katowice, Poland
-
--
April 2007 – August 2007
C++ Software Developer (internship)
Development of an application used for creating, managing and printing invoices.
This program was in use in around 30 departments of OPA-LABOR during 4 years.
Look at the Projects section for a detailed description
OPA-LABOR
Wyzwolenia 22, 41-103 Siemianowice Ślaskie,
˛
Poland
- - 102
--
Certificates
5
5
Certified
Validity
Title
Scores
Subjects/skills covered
Issuing institution
October 2014
October 2016
TOEFL iBT | certificate
Total: 98/120 (82 %), Reading: 28/30, Listening: 22/30, Speaking: 23/30, Writing: 25/30
English language knowledge
Educational Testing Service, USA
4
Certified
Validity
Title
Subjects/skills covered
Issuing institution
December 2012
unlimited
Bevis for Prøve i Dansk 3 | certificate
Danish language knowledge
Ministry of Education (Undervisningsministeriet), Denmark
3
Certified
Re-certified
Validity
Title
Subjects/skills covered
Issuing institution
November 2010
October 2013
October 2016
Cisco Certified Network Professional (CCNP) | certificate
Administration of LAN, WLAN, and WAN computer networks
Cisco Systems, USA
2
Certified
Re-certified
Validity
Title
Subjects/skills covered
Issuing institution
September 2006
October 2013
October 2016
Cisco Certified Network Associate (CCNA) | certificate
Administration of LAN, WLAN, and WAN computer networks
Cisco Systems, USA
1
Certified
Validity
Title
March 2007
unlimited
English Language Certificate for Applicants for The International Association for the Exchange of Students for Technical Experience (IAESTE) Training
English language knowledge
Silesian University of Technology (Politechnika Ślaska),
˛
Poland
Subjects/skills covered
Issuing institution
Education
3
Period
Degree
Thesis title
Main supervisor
Co-supervisor
Co-supervisor
University
December 2010 – June 2014
Doctor of Philosophy (PhD) | diploma
Classification and Analysis of Computer Network Traffic
Jens Myrup Pedersen, Aalborg University, Aalborg, Denmark
Tahir Riaz, Aalborg University, Aalborg, Denmark
Pere Barlet-Ros, Universitat Politècnica de Catalunya (UPC), Barcelona, Spain
Aalborg University, Aalborg, Denmark
2
Period
Degree
Field of study
University
September 2007 – June 2009
Bachelor of Computer Engineering | diploma
Computer Engineering, Faculty of Engineering
University of Southern Denmark (Syddansk Universitet), Odense, Denmark
1
Period
Degree
Field of study
University
October 2003 – October 2008
Master of Science in Engineering | diploma
Computer Engineering, specialty: Databases, Computer Networks and Computer Systems
Silesian University of Technology (Politechnika Ślaska),
˛
Gliwice, Poland
6
Professional training
3
Period
Title
October 2009 – June 2010
CCNP 1 Building Scalable Cisco Internetworks (BSCI), v. 5.0
CCNP 2 Implementing Secure Converged Wide Area Networks (ISCW), v. 5.0
CCNP 3 Building Cisco Multilayer Switched Networks (BCMSN), v. 5.0
CCNP 4 Optimizing Converged Cisco Networks (ONT), v. 5.0
Cisco Networking Acadamy
Name of organization
2
Period
Title
October 2005 – February 2007
MS SQL Server
Managing and Maintaining a Microsoft Windows Server 2003 Environment
Implementing and Supporting Microsoft Windows XP Professional
Silesian University of Technology (Politechnika Ślaska)
˛
& Microsoft Corporation
Name of organization
1
Period
Title
October 2005 – September 2006
CCNA 1 Networking Basics, v. 3.1
CCNA 2 Routers and Routing Basics, v. 3.1
CCNA 3 Switching Basics and Intermediate Routing, v. 3.1
CCNA 4 WAN Technologies, v. 3.1
Cisco Networking Acadamy
Name of organization
Languages
Language
Reading
Writing
Speaking
Polish
English
Danish
German
Native
Advanced
Intermediate
Elementary
Native
Advanced
Elementary
Elementary
Native
Advanced
Elementary
Elementary
Driver’s licenses
Category
Valid for
Valid from
European B
European AM
Motor vehicles
Mopeds
December 2002
December 2002
Other skills and competences
Academic skills
Research, experimentation, supervision, teaching, scientific writing, LaTeX, typesetting
Computer networks
Network monitoring, traffic analysis and classification, Deep Packet Inspection (DPI)
Routing protocols (RIP, OSPF, and BGP) & switching
TCP/IP stack, HTTP, SSL, DNS
7
Databases
Planning, designing, implementing, troubleshooting and securing databases
SQLITE, MSSQL, MySQL, and PostgreSQL database servers, SQL programming
Software development
Scrum agile framework
C/C++, Python, Java, SQL, PHP, JavaScript, AJAX, and InDesign Server programming
Deep Packet Inspection, DNS inspection, BGP analysis, Autonomous Systems matching, client-server applications, raw sockets,
system sockets monitoring, network protocol decoding and classification
Atlassian software (JIRA, Confluence), GIT, Gerrit, Jenkins
Operating systems
Windows and Linux operating systems
Implementing, securing and troubleshooting Linux routers, including wireless routers
Internet and Web Services
HTML, JavaScript, DHTML, PHP languages, and AJAX technology
Managing and troubleshooting WWW servers and websites, Internet portals, databases, and control panels (e.g. cPanel)
E-mail, WWW, DNS, instant messaging, P2P technology, Windows and Linux firewalls, and proxies
Grants and scholarships
3
Period
Description
Providers
December 2010 – December 2013
PhD scholarship, grant no. 8–10100
Aalborg University, Denmark
Bredbånd Nord A/S, Denmark
European Regional Development Fund (ERDF)
2
Period
Description
Provider
January 2013 – April 2013
Research grant for 3-month stay at Universitat Politècnica de Catalunya (UPC) in Barcelona, Spain
Aalborg University, Denmark
1
Period
Description
September 2007 – June 2008
ERASMUS (European Region Action Scheme for the Mobility of University Students) student grant.
Destination: Syddansk Universitet (University of Southern Denmark), Odense, Denmark
European Union
Provider
Memberships
3
Period
Organization
Description
February 2017 – Present
A Million Happy Cats Association (Stowarzyszenie Milion Szcz˛eśliwych Kotów)
Non-Governmental Organization (NGO) in Szczecin, Poland | web
2
Period
Organization
Description
April 2012 – December 2015
Institute of Electrical and Electronics Engineers (IEEE)
Member #-
1
Period
Organization
Description
November 2012 – December 2013
PhD Network at Aalborg University (PAU)
Board Member of the official association of PhD Students at Aalborg University
8
Distinctions and awards
2
Date
Description
Awarder
February 2012
Certificate of Outstanding Paper Award. Top 7% of 597 submissions to the ICACT 2012 conference
Global IT Research Institute, Republic of Korea
1
Date
Description
Awarder
February 2012
Distinguished group of 5% best papers presented at TELFOR 2011
TELFOR Journal Editor, Serbia
Projects
Scientific projects
4
Period
Title
Role
Project code
Funding entity
Funding entity code
Budget
Start date
End date
Scientific coordinator
3
Period
Title
Role
Project code
Funding entity
Start date
End date
Accessible (SVN)
Details
October 2014 – March 2015
Architecture with Knowledge of the Environment for the Future Internet (Arquitectura con
Conocimiento del Entorno de la Futura Internet)
Project Investigator
K00530
Ministry of Economy and Competitiveness (Ministerio de Economía y Competitividad), Spain
EUIN- EUR-
Josep Solé Pareta, Universitat Politècnica de Catalunya, Spain
March 2014 – April 2015
nDPIng – Next Generation Traffic Classification Library
Principal Investigator
nDPIng
None-
Undefined
https://svn.ntop.org/svn/ntop/trunk/nDPIng/
The aim of this unique project is to bring new quality to the field of traffic classification by providing the results on many levels. The results obtained from nDPIng are easy to be accounted
and they are given as: protocol (beginning from TCP/UDP, then going into higher levels),
content type, service provider (the well-known name of the remote host , e.g., Facebook for
web browser flows from Facebook), and content provider (content delivery network: cdn, e.g.,
Akamai or Google). Examples of the results provided in the non-verbose mode:
- proto: TCP->SSL_with_certificate->POP3S, service: Google – an encrypted POP3 session
with a Google mail server.
- proto: TCP->SSL_with_certificate, service: Twitter – an encrypted connection to a Twitter
server.
- proto: TCP->FTP_Data, content: JPG – a file-transfer FTP session, which carries a JPG
image.
- proto: TCP->SSL_with_certificate->Dropbox, cdn: Dropbox – an encrypted Dropbox session
(the application is Dropbox) with the Dropbox server.
- proto: TCP->SSL_with_certificate, cdn: Dropbox – an encrypted session with a Dropbox
server, while the application is unknown (it can be a web browser connection).
- proto: TCP->HTTP, content: WebM, service: YouTube, cdn: Google – a flow from YouTube
coming from Google server, which transports WebM movie.
- proto: TCP->HTTP, service: Google, cdn: Google – an HTTP flow from Google, obtained
from the Google server.
There is a possibility to obtain the domain names which are associated with the service and
content providers – see the example application attached to the project.
9
2
Period
Title
Role
Project code
Funding entities
Start date
End date
Accessible
Details
1
Period
Title
Role
Funding entity
Start date
End date
Scientific coordinator
Accessible
Details
January 2011 – December 2013
Volunteer-Based System for Research on The Internet
Principal Investigator
VBS
Aalborg University, Denmark
Bredbånd Nord, Denmark
European Regional Development Fund (ERDF-
Undefined
http://vbsi.sourceforge.net
This project is focused on designing and developing a system, which is able to provide detail
data about applications used in the Internet. This information can be used for obtaining the
knowledge which applications are most frequently used in the network, providing the users
some basic statistics about their Internet connection usage (for example, for which kinds of
applications their connection is used the most), creating scientific profiles of traffic generated
by different applications or different groups of applications, etc.
The developed Volunteer-Based system has the client-server architecture. Clients are installed among machines belonging to volunteers, while the server is installed on the computer
located in the premises of the data collecting entity. Each client registers information about
the data passing computer’s network interfaces. Captured packets are grouped into flows. A
flow is defined as a group of packets which have the same local and remote IP addresses,
local and remote ports, and using the same transport layer protocol. For every flow, the client
registers: anonymized identifier of the client, start timestamp of the flow, anonymized local
and remote IP addresses, local and remote ports, transport protocol, anonymized global IP
address of the client, and name of the application associated with that flow. The name of the
application is taken from the system sockets. For every packet, the client additionally registers: direction, size, state of all TCP flags (for TCP connections only), time in microseconds
elapsed from the previous packet in the flow, and type of transmitted HTTP content. We do
not inspect the payload – the type of the HTTP content is obtained from the HTTP header,
which is present in the first packet carrying this specific content. One HTTP flow (for example
a connection to a web server) can carry multiple files: HTML documents, JPEG images, CSS
stylesheets, etc. Thanks to that ability implemented in our VBS, we are able to split the flow
and separate particular HTTP contents. The data collected by VBS are stored in a local file
and periodically sent to the server. The task of the server is to receive the data from clients
and to store them into the MySQL database.
This open source tool is released under GNU General Public License v3.0 and published as
a SourceForge project. Both Windows and Linux versions are available. VBS is designed to
collect the traffic from numerous volunteers spread around the world and, therefore, with a
sufficient number of volunteers the collected data can provide us with a good statistical base.
December 2010 – March 2013
Collaborating Living Labs
Project Member
NordForsk, Norway-
Mari Linn Larsen, University of Stavanger, Norway
http://www.coll-livinglab.org
Compare Testlab in Karlstad, NettOp at the University of Stavanger, and CNP at Aalborg
University, are three living labs for development of new ICT-services, infrastructure and media
by means of involving users (i.e. end users as well as companies). The industrial partners
Ipark (Stavanger Innovation Park), ICTNORCOM, and the Greater Stavanger Development
will present real cases to which users will be invited to co-create and test ICT services.
The aim of this project is to build on and improve the work of existing Living Labs and generate
knowledge on how to innovate new services, media and infrastructure in Living Labs in three
different Nordic countries.
Other projects
10
3
Period
Title
Role
Funding entity
Start date
End date
Industrial coordinator
Accessible
Details
November 2014 – April 2015
Deep Packet Inspection API Standardization
Project Member
None (collaborative open-source project-
Undefined
Franck Baudin, Qosmos, France
2
Period
Title
Role
Funding entity
Start date
End date
Industrial coordinator
Details
February 2009 – September 2010
Web-Based Client for InDesign Server
Project Leader, Principal Software Developer
Imento, Denmark-
Claus Bolund Pedersen, Imento, Denmark
The goal of this project was to design and implement a new module for Imento – a webbased system for creating fliers and advertisements, which is in use by many well-known
companies in Denmark, e.g., 727, Cosmographic, Lidl, Spar, Bong, Nordal, Intersport, Bygma,
and Tempur. The system consists of a media bank and a product database, which are used to
store all the information about the products needed by the customers. The task of the module
being the subject of this project was to allow easy production of real advertisements, in the
inDesign and PDF formats, using the web-based Imento interface.
The built solution uses web-based techniques and tools (e.g., HTML, JavaScript, jQuery, and
AJAX) in collaboration with a headless version of inDesign Server, controlled by scripts produced by the web interface. At first, the user is able to choose a template used for building the
advertisement. Then, the website turns into an environment known from drawing and painting
applications, where the user can use existing snippets (per-product graphical templates) to
build multi-pages multi-layer document by dragging and dropping the selected objects. The
information about the products (e.g., images, prices, and descriptions) are automatically imported from the database and rendered in the document in the real-time. The user is able to
save the document and return to it later. The document can be saved in the inDesign format
or exported to PDF.
1
Period
Title
Role
Project code
Funding entity
Budget
Start date
End date
Industrial coordinator
April 2007 – August 2009
An Efficient Invoicing Solution with Offline Synchronization Capabilities
Project Leader, Principal Software Developer
Faktury2007
OPA-LABOR, Poland
4 000.00 EUR-
Tadeusz Gruszka, OPA-LABOR, Poland
http://groups.google.com/d/forum/dpi-api-standardization-group
This project aims at defining a standard Deep Packet Inspection API that most DPI implementations will support. In order to achieve this goal, the API will be released under an open
license. This will promote DPI libraries interchange, so that it will be possible to plug/unplug
implementations as needed. The standardization group consists of developers of both the
commercial and open-source DPI software.
11
Details
The project was concentrated on creating an invoicing system for a mining company, which
will be characterized by a significant fraction of features differing from other systems already
existing in the market. These requirements are imposed due to a very specific way how the
company works and makes its revenue. The company consists of main headquarters and
more than 30 departments in different geographical locations. The tariffs used by the particular departments are different and should be able to be created and entered into the system
only in the main headquarters, while both the main headquarters and the departments should
be able to use the tariffs for invoicing purposes. Additionally, the departments are allowed to
create custom invoices, which are not based on tariffs, but they must be properly marked to
be checked into the headquarters. The departments cannot directly print any invoices; this
ability is reserved for the headquarters. The departments had only dial-up Internet connection and, therefore, the tariffs and generated invoices needed to be synchronized between the
headquarters and departments using small files distributed by e-mails. Additionally, the headquarters needed to have abilities to edit any invoice or to create a memo. The designed and
implemented system was in use in around 30 departments of OPA-LABOR during 4 years,
successfully satisfying all the requirements set in this project.
Publications
Books
1
Authors
Title
Pages
Publisher
Date
ISBN
Accessible
Abstract
Tomasz Bujlow
Classification and Analysis of Computer Network Traffic
1–262
Networking & Security, Department of Electronic Systems, Aalborg University
June-
Publisher’s version (DOI: none) | Author’s version (free of charge)
Traffic monitoring and analysis can be done for multiple different reasons: to investigate the usage of
network resources, adjust Quality of Service (QoS) policies in the network, log the traffic to comply with
the law, or create realistic models of traffic for academic purposes. The core activity in this area is traffic
classification, which is the main topic of this thesis.
We introduced the already known methods for traffic classification (as by using transport layer port numbers, Deep Packet Inspection (DPI), statistical classification) and assessed their usefulness in particular
areas. Statistical classifiers based on Machine Learning Algorithms (MLAs) were shown to be accurate
and at the same time they do not consume a lot of resources and do not cause privacy concerns. However,
they require good quality training data. We performed substantial testing of widely used DPI classifiers
and assessed their usefulness in generating ground-truth, which can be used as training data for MLAs.
Because the existing methods were shown to not be capable of generating the proper training data, we
built our own host-based system for collecting and labeling of network data, which depends on volunteers.
Afterwards, we designed and implemented our own system for traffic classification based on various statistical methods, which provides consistent results on all of the 6 levels: Ethernet, IP protocol, application,
behavior, content, and service provider. Finally, we contributed to the open source community by improving the accuracy of nDPI traffic classifier. The thesis also evaluates the possibilities of using various traffic
classifiers in order to assess the per-application QoS level.
Articles in journals
4
Authors
Title
Journal
ISSN
Volume
Number
Pages
Publisher
Date
Tomasz Bujlow, Valentín Carela-Español, Josep Solé-Pareta, and Pere Barlet-Ros
A Survey on Web Tracking: Mechanisms, Implications, and Defenses
Proceedings of the IEEE- (print),- (electronic-
IEEE
March 2017
12
Accessible
Abstract
Publisher’s version (DOI: 10.1109/JPROC-) | Author’s version (free of charge)
Privacy seems to be the Achilles’ heel of today’s web. Most web services make continuous efforts to
track their users and to obtain as much personal information as they can from the things they search, the
sites they visit, the people they contact, and the products they buy. This information is mostly used for
commercial purposes, which go far beyond targeted advertising. Although many users are already aware
of the privacy risks involved in the use of internet services, the particular methods and technologies used
for tracking them are much less known. In this survey, we review the existing literature on the methods
used by web services to track the users online as well as their purposes, implications, and possible
user’s defenses. We present five main groups of methods used for user tracking, which are based on
sessions, client storage, client cache, fingerprinting, and other approaches. A special focus is placed on
mechanisms that use web caches, operational caches, and fingerprinting, as they are usually very rich
in terms of using various creative methodologies. We also show how the users can be identified on the
web and associated with their real names, e-mail addresses, phone numbers, or even street addresses.
We show why tracking is being used and its possible implications for the users. For each of the tracking
methods, we present possible defenses. Some of them are specific to a particular tracking approach,
while others are more universal (block more than one threat). Finally, we present the future trends in user
tracking and show that they can potentially pose significant threats to the users’ privacy.
3
Authors
Title
Journal
ISSN
Volume
Number
Pages
Publisher
Date
Accessible
Abstract
Tomasz Bujlow, Valentín Carela-Español, and Pere Barlet-Ros
Independent Comparison of Popular DPI Tools for Traffic Classification
Computer Networks-
Elsevier B.V.
January 2015
Publisher’s version (DOI: 10.1016/j.comnet-) | Author’s version (free of charge)
Deep Packet Inspection (DPI) is the state-of-the-art technology for traffic classification. According to the
conventional wisdom, DPI is the most accurate classification technique. Consequently, most popular
products, either commercial or open-source, rely on some sort of DPI for traffic classification. However,
the actual performance of DPI is still unclear to the research community, since the lack of public datasets
prevent the comparison and reproducibility of their results. This paper presents a comprehensive comparison of 6 well-known DPI tools, which are commonly used in the traffic classification literature. Our
study includes 2 commercial products (PACE and NBAR) and 4 open-source tools (OpenDPI, L7-filter,
nDPI, and Libprotoident). We studied their performance in various scenarios (including packet and flow
truncation) and at different classification levels (application protocol, application and web service). We
carefully built a labeled dataset with more than 750 K flows, which contains traffic from popular applications. We used the Volunteer-Based System (VBS), developed at Aalborg University, to guarantee the
correct labeling of the dataset. We released this dataset, including full packet payloads, to the research
community. We believe this dataset could become a common benchmark for the comparison and validation of network traffic classifiers. Our results present PACE, a commercial tool, as the most accurate
solution. Surprisingly, we find that some open-source tools, such as nDPI and Libprotoident, also achieve
very high accuracy.
2
Authors
Title
Journal
ISSN
Volume
Number
Pages
Publisher
Date
Accessible
Tomasz Bujlow, Sara Ligaard Nørgaard Hald, Tahir Riaz, and Jens Myrup Pedersen
A Method for Evaluation of Quality of Service in Computer Networks
ICACT Transactions on the Advanced Communications Technology (ICACT-TACT- (Online)
1
2
17–25
Global IT Research Institute (GiRI)
July 2012
Publisher’s version (DOI: none) | Author’s version (free of charge)
13
1
Abstract
Monitoring of the Quality of Service (QoS) in high-speed Internet infrastructures is a challenging task.
However, precise assessments must take into account the fact that the requirements for the given quality level are service-dependent. The backbone QoS monitoring and analysis requires processing of
large amounts of data and the knowledge about the kinds of applications, which generate the traffic. To
overcome the drawbacks of existing methods for traffic classification, we proposed and evaluated a centralized solution based on the C5.0 Machine Learning Algorithm (MLA) and decision rules. The first task
was to collect and to provide to C5.0 high-quality training data divided into groups, which correspond to
different types of applications. It was found that the currently existing means of collecting data (classification by ports, Deep Packet Inspection, statistical classification, public data sources) are not sufficient
and they do not comply with the required standards. We developed a new system to collect the training
data, in which the major role is performed by volunteers. Client applications installed on volunteers’ computers collect the detailed data about each flow passing through the network interface, together with the
application name taken from the description of system sockets. This paper proposes a new method for
measuring the level of Quality of Service in broadband networks. It is based on our Volunteer-Based System to collect the training data, Machine Learning Algorithms to generate the classification rules and the
application-specific rules for assessing the QoS level. We combine both passive and active monitoring
technologies. The paper evaluates different possibilities of the implementation, presents the current implementation of the particular parts of the system, their initial runs and the obtained results, highlighting
parts relevant from the QoS point of view.
Authors
Tomasz Bujlow, Kartheepan Balachandran, Sara Ligaard Nørgaard Hald, Tahir Riaz, and Jens Myrup
Pedersen
Volunteer-Based System for Research on the Internet Traffic
TELFOR Journal- (Print),- (Online)
4
1
2–7
TELFOR
September 2012
Publisher’s version (DOI: none) | Author’s version (free of charge)
To overcome the drawbacks of the existing methods for traffic classification (by ports, Deep Packet Inspection, statistical classification), a new system was developed, in which the data are collected and
classified directly by clients installed on machines belonging to volunteers. Our approach combines the
information obtained from the system sockets, the HTTP content types, and the data transmitted through
network interfaces. It allows to group packets into flows and associate them with particular applications
or the types of service. This paper presents the design and implementation of our system, the testing
phase and the obtained results. The performed threat assessment highlights potential security issues
and proposes solutions in order to mitigate the risks. Furthermore, it proves that the system is feasible
in terms of uptime and resource usage, assesses its performance and proposes future enhancements.
We released the system under The GNU General Public License v3.0 and published it as a SourceForge
project called Volunteer-Based System for Research on the Internet.
Title
Journal
ISSN
Volume
Number
Pages
Publisher
Date
Accessible
Abstract
Conference papers
8
Authors
Title
Publication
Pages
Organization
Place
Date
Accessible
Luca Deri, Maurizio Martinelli, Tomasz Bujlow, and Alfredo Cardigliano
nDPI: Open-Source High-Speed Deep Packet Inspection
Proceedings of the 10th International Wireless Communications & Mobile Computing Conference 2014
(IWCMC-
IEEE
Nicosia, Cyprus
August 2014
Publisher’s version (DOI: 10.1109/IWCMC-) | Author’s version (free of charge)
14
7
Abstract
Network traffic analysis was traditionally limited to packet header, because the transport protocol and
application ports were usually sufficient to identify the application protocol. With the advent of portindependent, peer-to-peer, and encrypted protocols, the task of identifying application protocols became increasingly challenging, thus creating a motivation for creating tools and libraries for network
protocol classification. This paper covers the design and implementation of nDPI, an open-source library for protocol classification using both packet header and payload. nDPI was extensively validated
in various monitoring projects ranging from Linux kernel protocol classification, to analysis of 10 Gbit
traffic, reporting both high protocol detection accuracy and efficiency.
Authors
Title
Publication
Valentín Carela-Español, Tomasz Bujlow, and Pere Barlet-Ros
Is our Ground-Truth for Traffic Classification Reliable?
Proceedings of the 15th Passive and Active Measurement Conference (PAM 2014), Proceedings Series:
Lecture Notes in Computer Science-
Springer International Publishing Switzerland
Los Angeles, USA
March 2014
Publisher’s version (DOI: 10.1007/-_10) | Author’s version (free of charge)
The validation of the different proposals in the traffic classification literature is a controversial issue.
Usually, these works base their results on a ground-truth built from private datasets and labeled by
techniques of unknown reliability. This makes the validation and comparison with other solutions an
extremely difficult task.
This paper aims to be a first step towards addressing the validation and trustworthiness problem of
network traffic classifiers. We perform a comparison between 6 well-known DPI-based techniques,
which are frequently used in the literature for ground-truth generation. In order to evaluate these tools
we have carefully built a labeled dataset of more than 500 000 flows, which contains traffic from popular
applications. Our results present PACE, a commercial tool, as the most reliable solution for groundtruth generation. However, among the open-source tools available, NDPI and especially Libprotoident,
also achieve very high precision, while other, more frequently used tools (e.g., L7-filter ) are not reliable
enough and should not be used for ground-truth generation in their current form.
Pages
Organization
Place
Date
Accessible
Abstract
6
Authors
Title
Publication
Pages
Organization
Place
Date
Accessible
Abstract
5
Authors
Title
Publication
Pages
Organization
Place
Tomasz Bujlow and Jens Myrup Pedersen
Obtaining Application-Based and Content-Based Internet Traffic Statistics
Proceedings of the 6th International Conference on Signal Processing and Communication Systems
(ICSPCS’12)
1–10
IEEE
Gold Coast, Queensland, Australia
December 2012
Publisher’s version (DOI: 10.1109/ICSPCS-) | Author’s version (free of charge)
Understanding Internet traffic is crucial in order to facilitate the academic research and practical network
engineering, e.g. when doing traffic classification, prioritization of traffic, creating realistic scenarios and
models for Internet traffic development etc. In this paper, we demonstrate how the Volunteer-Based
System for Research on the Internet, developed at Aalborg University, is capable of providing detailed
statistics of Internet usage. Since an increasing amount of HTTP traffic has been observed during the
last few years, the system also supports creating statistics of different kinds of HTTP traffic, like audio,
video, file transfers, etc. All statistics can be obtained for individual users of the system, for groups of
users, or for all users altogether. This paper presents results with real data collected from a limited
number of real users over six months. We demonstrate that the system can be useful for studying the
characteristics of computer network traffic in application-oriented or content-type-oriented way, and is
now ready for a larger-scale implementation. The paper is concluded with a discussion about various
applications of the system and the possibilities of further enhancements.
Jens Myrup Pedersen and Tomasz Bujlow
Obtaining Internet Flow Statistics by Volunteer-Based System
Proceedings of the Fourth International Conference on Image Processing & Communications (IP&C
2012), Image Processing & Communications Challenges 4, AISC-
Springer Berlin Heidelberg
Bydgoszcz, Poland
15
4
Date
Accessible
Abstract
September 2012
Publisher’s version (DOI: 10.1007/-_32) | Author’s version (free of charge)
In this paper, we demonstrate how the Volunteer Based System for Research on the Internet, developed
at Aalborg University, can be used for creating statistics of Internet usage. Since the data are collected
on individual machines, the statistics can be made on the basis of both individual users and groups of
users, and as such be useful also for segmentation of the users into groups. We present results with
data collected from real users over several months; in particular we demonstrate how the system can
be used for studying flow characteristics - the number of TCP and UDP flows, average flow lengths,
and average flow durations. The paper is concluded with a discussion on what further statistics can be
made, and the further development of the system.
Authors
Title
Publication
Tomasz Bujlow, Tahir Riaz, and Jens Myrup Pedersen
Classification of HTTP Traffic Based on C5.0 Machine Learning Algorithm
Proceedings of the Fourth IEEE International Workshop on Performance Evaluation of Communications
in Distributed Systems and Web-based Service Architectures (PEDISWESA-
IEEE
Cappadocia, Turkey
July 2012
Publisher’s version (DOI: 10.1109/ISCC-) | Author’s version (free of charge)
Our previous work demonstrated the possibility of distinguishing several kinds of applications with accuracy of over 99 %. Today, most of the traffic is generated by web browsers, which provide different
kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish
various types of HTTP content: distributed among volunteers’ machines and centralized running in the
core of the network. We also assess the accuracy of the global classifier for both HTTP and non-HTTP
traffic. We achieved accuracy of 94 %, which supposed to be even higher in real-life usage. Finally, we
provided graphical characteristics of different kinds of HTTP traffic.
Pages
Organization
Place
Date
Accessible
Abstract
3
Authors
Title
Publication
Pages
Organization
Place
Date
Accessible
Abstract
Tomasz Bujlow, Tahir Riaz, and Jens Myrup Pedersen
A Method for Assessing Quality of Service in Broadband Networks
Proceedings of the 14th International Conference on Advanced Communication Technology (ICACT)
826–831
IEEE
Phoenix Park, PyeongChang, Korea
February 2012
Publisher’s version (DOI: none) | Author’s version (free of charge)
Monitoring of Quality of Service (QoS) in high-speed Internet infrastructure is a challenging task. However, precise assessments must take into account the fact that the requirements for the given quality level are service-dependent. Backbone QoS monitoring and analysis requires processing of large
amount of the data and knowledge of which kind of application the traffic belongs to. To overcome the
drawbacks of existing methods for traffic classification we proposed and evaluated a centralized solution
based on C5.0 Machine Learning Algorithm (MLA) and decision rules. The first task was to collect and
provide C5.0 high-quality training data, divided into groups corresponding to different types of applications. It was found that currently existing means of collecting data (classification by ports, Deep Packet
Inspection, statistical classification, public data sources) are not sufficient and they do not comply with
the required standards. To collect training data a new system was developed, in which the major role is
performed by volunteers. Client applications installed on their computers collect the detailed data about
each flow passing through the network interface, together with the application name taken from the
description of system sockets. This paper proposes a new method for measuring the Quality of Service
(QoS) level in broadband networks, based on our Volunteer-Based System for collecting the training
data, Machine Learning Algorithms for generating the classification rules and application-specific rules
for assessing the QoS level. We combine both passive and active monitoring technologies. The paper
evaluates different implementation possibilities, presents the current implementation of particular parts
of the system, their initial runs and obtained results, highlighting parts relevant from the QoS point of
view.
2
Authors
Title
Tomasz Bujlow, Tahir Riaz, and Jens Myrup Pedersen
A Method for Classification of Network Traffic Based on C5.0 Machine Learning Algorithm
16
Publication
Proceedings of ICNC’12: 2012 International Conference on Computing, Networking and Communications (ICNC): Workshop on Computing, Networking and Communications
244–248
IEEE
Maui, Hawaii, USA
February 2012
Publisher’s version (DOI: 10.1109/ICCNC-) | Author’s version (free of charge)
Monitoring of the network performance in a high-speed Internet infrastructure is a challenging task,
as the requirements for the given quality level are service-dependent. Therefore, the backbone QoS
monitoring and analysis in Multi-hop Networks requires the knowledge about the types of applications
forming the current network traffic. To overcome the drawbacks of existing methods for traffic classification, usage of C5.0 Machine Learning Algorithm (MLA) was proposed. On the basis of the statistical
traffic information received from volunteers and C5.0 algorithm, we constructed a boosted classifier,
which was shown to have the ability to distinguish between 7 different applications in the test set of
76,632–1,622,710 unknown cases with average accuracy of 99.3–99.9 %. This high accuracy was
achieved by using high quality training data collected by our system, a unique set of parameters used
for both training and classification, an algorithm for recognizing flow direction and the C5.0 itself. The
classified applications include Skype, FTP, torrent, web browser traffic, web radio, interactive gaming
and SSH. We performed subsequent tries using different sets of parameters and both training and classification options. This paper shows how we collected accurate traffic data, presents arguments used in
classification process, introduces the C5.0 classifier and its options, and finally, evaluates and compares
the obtained results.
Pages
Organization
Place
Date
Accessible
Abstract
1
Authors
Title
Publication
Pages
Organization
Place
Date
Accessible
Abstract
Tomasz Bujlow, Kartheepan Balachandran, Tahir Riaz, and Jens Myrup Pedersen
Volunteer-Based System for Classification of Traffic in Computer Networks
Proceedings of the 19th Telecommunications Forum TELFOR-
IEEE
Belgrade, Serbia
November 2011
Publisher’s version (DOI: 10.1109/TELFOR-) | Author’s version (free of charge)
To overcome the drawbacks of existing methods for traffic classification (by ports, Deep Packet Inspection, statistical classification) a new system was developed, in which the data are collected from client
machines. This paper presents design of the system, implementation, initial runs and obtained results.
Furthermore, it proves that the system is feasible in terms of uptime and resource usage, assesses its
performance and proposes future enhancements.
Technical reports
4
Authors
Title
Pages
Publisher
Date
Accessible
Tomasz Bujlow, Valentín Carela-Español, Josep Solé Pareta, and Pere Barlet-Ros
Web Tracking: Mechanisms, Implications, and Defenses
1–29
arXiv.org: Computer Science – Computers and Society
July 2015
Publisher’s version (DOI: none) | Author’s version (free of charge)
17
Abstract
This articles surveys the existing literature on the methods currently used by web services to track the
user online as well as their purposes, implications, and possible user’s defenses. A significant majority of
reviewed articles and web resources are from years 2012 – 2014. Privacy seems to be the Achilles’ heel
of today’s web. Web services make continuous efforts to obtain as much information as they can about the
things we search, the sites we visit, the people with who we contact, and the products we buy. Tracking is
usually performed for commercial purposes. We present 5 main groups of methods used for user tracking,
which are based on sessions, client storage, client cache, fingerprinting, or yet other approaches. A
special focus is placed on mechanisms that use web caches, operational caches, and fingerprinting, as
they are usually very rich in terms of using various creative methodologies. We also show how the users
can be identified on the web and associated with their real names, e-mail addresses, phone numbers, or
even street addresses. We show why tracking is being used and its possible implications for the users.
For example, we describe recent cases of price discrimination, assessing financial credibility, determining
insurance coverage, government surveillance, and identity theft. For each of the tracking methods, we
present possible defenses. Some of them are specific to a particular tracking approach, while others are
more universal (block more than one threat) and they are discussed separately. Apart from describing the
methods and tools used for keeping the personal data away from being tracked, we also present several
tools that were used for research purposes – their main goal is to discover how and by which entity the
users are being tracked on their desktop computers or smartphones, provide this information to the users,
and visualize it in an accessible and easy to follow way. Finally, we present the currently proposed future
approaches to track the user and show that they can potentially pose significant threats to the users’
privacy.
3
Authors
Title
Pages
Publisher
Date
Accessible
Abstract
Tomasz Bujlow and Jens Myrup Pedersen
A Practical Method for Multilevel Classification and Accounting of Traffic in Computer Networks
1–56
Department of Electronic Systems, Aalborg University
February 2014
Publisher’s version (DOI: none) | Author’s version (free of charge)
Existing tools for traffic classification are shown to be incapable of identifying the traffic in a consistent
manner. For some flows only the application is identified, for others only the content, for yet others only the
service provider. Furthermore, Deep Packet Inspection is characterized by extensive needs for resources
and privacy or legal concerns. Techniques based on Machine Learning Algorithms require good quality
training data, which are difficult to obtain. They usually cannot properly deal with other types of traffic,
than they are trained to work with, and they are unable to detect the content carried by the flow, or the
service provider. To overcome the drawbacks of already existing methods, we developed a novel hybrid
method to provide accurate identification of computer network traffic on six levels: Ethernet, IP protocol,
application, behavior, content, and service provider. Our system built based on the method provides also
traffic accounting and it was tested on 2 datasets. We have shown that our system gives a consistent,
accurate output on all the levels. We also showed that the results provided by our system on the application
level outperformed the results obtained from the most commonly used DPI tools.
2
Authors
Title
Pages
Publisher
Date
Accessible
Abstract
Tomasz Bujlow, Valentín Carela-Español, and Pere Barlet-Ros
Extended Independent Comparison of Popular Deep Packet Inspection (DPI) Tools for Traffic Classification
1–440
Department of Computer Architecture (DAC), Universitat Politècnica de Catalunya (UPC)
January 2014
Publisher’s version (DOI: none) | Author’s version (free of charge)
Network traffic classification became an essential input for many network-related tasks. However, the
continuous evolution of the Internet applications and their techniques to avoid being detected (as dynamic
port numbers, encryption, or protocol obfuscation) considerably complicated their classification. We start
the report by introducing and shortly describing several well-known DPI tools, which later will be evaluated:
PACE, OpenDPI, L7-filter, NDPI, Libprotoident, and NBAR.
This report has several major contributions. At first, by using VBS, we created 3 datasets of 17 application
protocols, 19 applications (also various configurations of the same application), and 34 web services,
which are available to the research community. The first dataset contains full flows with entire packets,
the second dataset contains truncated packets (the Ethernet frames were overwritten by 0s after the 70th
byte), and the third dataset contains truncated flows (we took only 10 first packets for each flow). The
datasets contain 767 690 flows labeled on a multidimensional level. These datasets are available as a
bunch of PCAP files containing full flows including the packet payload, together with corresponding text
files, which describe the flows in the order as they were originally captured and stored in the PCAP files.
18
At second, we developed a method for labeling non-HTTP flows, which belong to web services (as
YouTube). Labeling based on the corresponding domain names taken from the HTTP header could allow
to identify only the HTTP flows. Other flows (as encrypted SSL / HTTPS flows, RTMP flows) are left unlabeled. Therefore, we implemented a heuristic method for detection of non-HTTP flows, which belong to
the specific services. Then, we examined the ability of the DPI tools to accurately label the flows included
in our datasets.
1
Authors
Title
Pages
Publisher
Date
Accessible
Abstract
Tomasz Bujlow, Valentín Carela-Español, and Pere Barlet-Ros
Comparison of Deep Packet Inspection (DPI) Tools for Traffic Classification
1–108
Department of Computer Architecture (DAC), Universitat Politècnica de Catalunya (UPC)
June 2013
Publisher’s version (DOI: none) | Author’s version (free of charge)
Nowadays, there are many tools, which are being able to classify the traffic in computer networks. Each of
these tools claims to have certain accuracy, but it is a hard task to asses which tool is better, because they
are tested on various datasets. Therefore, we made an approach to create a dataset, which can be used
to test all the traffic classifiers. In order to do that, we used our system to collect the complete packets from
the network interfaces. The packets are grouped into flows, and each flow is collected together with the
process name taken from Windows / Linux sockets, so the researchers do not only have the full payloads,
but also they are provided the information which application created the flow. Therefore, the dataset is
useful for testing Deep Packet Inspection (DPI) tools, as well as statistical, and port-based classifiers. The
dataset was created in a fully manual way, which ensures that all the time parameters inside the dataset
are comparable with the parameters of the usual network data of the same type. The system for collecting
of the data, as well as the dataset, are made available to the public. Afterwards, we compared the accuracy
of classification on our dataset of PACE, OpenDPI, NDPI, Libprotoident, NBAR, four different variants of
L7-filter, and a statistic-based tool developed at UPC. We performed a comprehensive evaluation of the
classifiers on different levels of granularity: application level, content level, and service provider level.
We found out that the best performing classifier on our dataset is PACE. From the non-commercial tools,
NDPI and Libprotoident provided the most accurate results, while the worst accuracy we obtained from all
4 versions of L7-filter.
Other scientific contributions
Presentations in seminars
12
11
Role
Topic
Event
Place
Date
Accessible
Co-author and Participant
User Tracking Uncovered (Tracking Catalog: Uncovering and analyzing user tracking on the Internet)
Data Transparency Lab (DTL) Launch Workshop
Telefonica, Barcelona, Spain
November 2014
Role
Topic
Event
Author and Presenter
Consistency, Accuracy, and Usefulness of Techniques and Tools for Network Traffic Identification
Seminar organized by the Networks, Systems, Services, and Security (R3S) research team from the
Distributed Services, Architectures, Modelling, Validation, and Network Administration (SAMOVAR) research unit
TELECOM Sudparis, Evry, France
May 2014
Place
Date
Accessible
10
Role
Topic
Event
Place
http://www.datatransparencylab.org
http://samovar.telecom-sudparis.eu/spip.php?article779
Author and Presenter
Obtaining Useful Classification Results by Deep Packet Inspection (DPI)
Complements of Network Management (SGR) course for the 8th semester student group from the
specialty of Computer Science
Computer Science Department, University of Pisa, Pisa, Italy
19
Date
April 2014
9
Role
Topic
Event
Place
Date
Author and Presenter
Usefulness of the Results – a Forgotten Evaluation Metric of Traffic Identification Tools
Seminar organized by the Telecommunication Networks Group
Department of Electronics and Telecommunications, Polytechnic University of Turin, Turin, Italy
April 2014
8
Role
Topic
Event
Author and Presenter
Advanced Network Traffic Monitoring & Analysis
Communication Networks and Ambient Intelligence course for the 7th semester student group from the
specialty of Network and Distributed Systems
Department of Electronic Systems, Aalborg University, Aalborg, Denmark
September 2013
Place
Date
7
Role
Topic
Event
Organizer
Place
Date
Accessible
Author and Presenter
Quality of Service (QoS) Assessment in Computer Networks
Second IntelliCIS Training School on Simulation-based design of Complex Infrastructure Systems
COST Action IC0806: Intelligent Monitoring, Control and Security of Critical Infrastructure Systems
(IntelliCIS)
RWTH University, Aachen, Germany
March 2013
http://www.intellicis.eu/Pages/Training_Schools.php
6
Role
Topic
Event
Place
Date
Author and Presenter
Traffic Monitoring and Analysis – Advanced Techniques Based on Machine Learning
Seminar on Traffic Monitoring and Analysis
Department of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona, Spain
November 2012
5
Role
Topic
Event
Place
Date
Author and Presenter
Classification of Traffic Using Machine Learning Techniques
Communication Networks and Ambient Intelligence course for the 7th semester student group from the
specialty of Network and Distributed Systems
Department of Electronic Systems, Aalborg University, Aalborg, Denmark
October 2012
4
Role
Topic
Event
Place
Date
Author and Presenter
Advanced Network Traffic Analysis
Life Long Learning course for external participants
Aalborg University, Aalborg, Denmark
August 2012
3
Role
Topic
Event
Organizer
Author and Presenter
Advanced End-User Traffic Monitoring
Internet Quality – More Than Bandwidth, an international industrial conference
Collaborating Living Labs (COLL) project: Compare Testlab – Karlstad University, NettOp – University
of Stavanger, and CNP – Aalborg University
IDA House of Engineers, Copenhagen, Denmark
June 2012
Place
Date
Accessible
2
Role
https://mit.ida.dk/IDAforum/u0631a/Documents/Internet%20kvalitet%20-%-/Tomasz%20Bujlow.pdf
Author and Presenter
20
Topic
Event
Organizer
Place
Date
Accessible
1
Role
Topic
Event
Place
Date
Volunteer-based System for Classification of Traffic in Computer Networks
First IntelliCIS Training School on Intelligent Monitoring of Critical Infrastructures
COST Action IC0806: Intelligent Monitoring, Control and Security of Critical Infrastructure Systems
(IntelliCIS)
Albena Resort, Bulgaria
October 2011
http://www.intellicis.eu/Pages/Training_Schools.php
Author and Presenter
Classification of Traffic in Integrated Computer Networks
Life Long Learning course for external participants
Aalborg University, Aalborg, Denmark
August 2011
Reviews of journal articles and conference papers
15
Publication
Publisher
Type
Date
Journal of Cyber Security Technology
Taylor & Francis Group
Article in a journal
February 2018
14
Publication
Publisher
Type
Date
Wireless Communications and Mobile Computing
Hindawi
Article in a journal
February 2018
13
Publication
Publisher
Type
Date
IEEE Communications Letters (IEEE COMML)
IEEE
Article in a journal
February 2017
12
Publication
Publisher
Type
Date
IEEE Communications Letters (IEEE COMML)
IEEE
Article in a journal
July 2016
11
Publication
Publisher
Type
Date
SoftwareX
Elsevier
Article in a journal
May 2016
10
Publication
Publisher
Type
Date
Computer Communications (COMCOM)
Elsevier
Article in a journal
March 2016
9
Publication
Publisher
Type
Date
IEEE Transactions on Network and Service Management (TNSM)
IEEE
Article in a journal
May 2015
8
Publication
IEEE Transactions on Network and Service Management (TNSM)
21
Publisher
Type
Date
IEEE
Article in a journal
October 2014
7
Publication
Publisher
Type
Date
IEEE Transactions on Network and Service Management (TNSM)
IEEE
Article in a journal
April 2014
6
Publication
Publisher
Type
Date
Scientia Iranica
Sharif University of Technology
Article in a journal
December 2013
5
Publication
Organization
Proceedings of the 21th Telecommunications Forum (TELFOR 2013)
Telecommunications Society, Belgrade; School of Electrical Engineering, University of Belgrade; IEEE
Serbia; Montenegro COM Chapter
Conference paper
October 2013
Type
Date
4
Publication
Organization
Type
Date
3
Publication
Proceedings of the 20th Telecommunications Forum (TELFOR 2012)
Telecommunications Society, Belgrade; School of Electrical Engineering, University of Belgrade; IEEE
Serbia; Montenegro COM Chapter
Conference paper
October 2012
Type
Date
Proceedings of the 2012 International Conference on Computing, Networking and Communications
(ICNC’12)
Conference paper
September 2011
2
Publication
Publisher
Type
Date
Zeszyty Naukowe. Telekomunikacja i Elektronika
University of Technology and Life Sciences in Bydgoszcz
Article in a journal
May 2011
1
Publication
Publisher
Type
Date
Computer Standards & Interfaces
Elsevier B.V.
Article in a journal
March 2011
I declare that I agree to have my personal data, if it necessary, processed for the recruitment process
22