Log Analysis Basics
Part 1: Command Practice
In the next steps, we will introduce and practice with a few basic commands that will help to
profile logs (i.e., learn about log files so that we know what we're about to analyze).
1. On the dock, click the Terminal icon to launch a new Terminal window.
2. At the command prompt, execute cd /home/cybrary/Documents/Practice to change
your current directory.
◇ The log files that we will use for these activities are stored in this directory.
◇ In the next steps, we will learn a few commands that will help to profile logs (i.e., learn
about log files so that we know what we're about to analyze).
◇ First, we will explore the file command. In Linux systems, the file command is used to
verify the type of file that you're working with.
◇ This can help you determine whether you have a text file that you may be able to review
using simple text processing tools (or more easily ingest into a SIEM) versus a different file type
that requires a proprietary viewer.
◇ For example, if a log file is some variant of plain text, you can view the content using the
Linux commands presented in the last module.
3. At the command prompt, execute file web.log.1 to display the file type for the web.log.1 file.
1/11
◇ In the output, you should see the file name listed, as well as the assessed type.
◇ Many commands in Linux can be applied to multiple files or directories at a time using an
asterisk character (*) - referred to in this context as a wildcard character.
◇ In the next step, you will use the file * command to display the file type of all files in your
current working directory.
4. At the command prompt, execute file * to display the file type of all files in the current
directory.
Note: You will see some of the files identified as "CSV". I know, I know. We just told you to
focus on ASCII. For the moment, consider "CSV" (Comma Separated Values) another way to just
say "ASCII with commas between strings of characters".
You will often need to simply display a log file to the screen. This can be done using a
variety of commands in Linux, including the cat command. The cat command - short for
concatenate - will read the data from the specified file and display that data as output on-screen.
In the next step, you will read the data from the vpn.log.1 file using the cat command.
5. At the command prompt, execute cat vpn.log.1 to display the contents of the vpn.log.1 file.
2/11
• From the output of this command, you should be able to answer the following questions:
> How many lines does it contain?
1
> Does it appear to have one or more lines per log record?
One
• While the cat command is useful for displaying short log records, if you try to display a
longer log record using the cat command, it can be easy to accidentally scroll or display content
such that you miss the header row.
• To make sure you see the header row (usually the first row) when displaying content from
the command line, you can simply use the head command to display any number of lines in a
file.
• In the next step, you will display the first and second lines of the web.log.1 file using the
head command.
6. At the command prompt, execute head -n 1 web.log.1 to display the first line of the file
web.log.1, then execute head -n 2 web.log.1 to display the first and second lines of the file
web.log.1.
• Why do it this way? So that you can first get a clear view of the header row first without
any other content in your view, and then subsequently add a log record to the display to see
how the header row compares to fields.
• That said, it's personal preference, and it's also notable that some logs will contain
multiple header rows.
◇ Now that we know a little bit about the most common elements of a log, let's talk about
time coverage.
7. At the command prompt, execute cat web.log.2 to display the file web.log.2, then execute
head -n 2 web.log.2 to view the first two lines of the web.log.2 file.
3/11
Based on what you see:
> What is the date and time of the earliest record?
▪-
> Were you able to see the earliest record when you used the cat command on this file?
▪ yeah
Understanding the time coverage or duration of a log is critical to understanding whether it is
applicable to your work. For example, if you are doing log analysis to investigate alerts that
occurred on a specific day and at a specific time, you need to analyze log records that cover that
same date and time.
Part 2: Profile a Single Log File
Sooooo…. You’ve got a log to analyze. So far, we know how to view a file and we learned the
rudiments of search. But what exactly are you supposed to do at the beginning? What do you
search for? Should you be searching? Do you even have the right log ?
Approaching a new log can be daunting when you’re new to this skill. To help get yourself
grounded, we'll spend the remainder of this exercise talking about some simple methods to
profile a single log file, so that you have a better idea of what you're dealing with, and so that
you'll be ready to tackle an analysis objective.
We did many of these things in Part 1 above. Now we're going to turn them into a more explicit
process. Here is a simple sequence of steps to go through when you are approaching a new log
analysis task.
• Step 1: Identify your analysis objective.
• Step 2: Get the right log
• Step 3: Verify the log's time coverage.
• Step 4: Identify the log’s size.
4/11
• Step 5: Preview log contents.
• Step 6: Select an analysis technique likely to resolve your objective.
Following the sequence of steps outlined above, we will walk through the process of profiling a
single log file using the commands you learned in Part 1.
1. Identify your analysis objective.
As we mentioned, for many log analysis tasks the objective will be assigned to you. For this
exercise, our objectives are as follows:
Objective A: Identify any "test" user accounts that have successfully authenticated via
VPN as can be observed in the appropriate log file(s).
Objective B: Identify the source IP address from which each "test" account was used to
make the VPN connections identified in Objective A.
Objective C: Identify any other user accounts used for VPN authentications with the same
source IP address as the "test" account(s).
You don't have to do anything with this information for this practice exercise other than keep it in
mind as we proceed, but you may choose to keep track of it somewhere.
In the future, you will complete analytical tasks that require reporting of your conclusions. In
those situations, having your objective documented in your report is a best practice, if not a
requirement.
2. Get the right log.
You can find the log for this exercise at /home/cybrary/Documents/Practice. There are two
files in this location named practice.log.1 and practice.log.2
Imagine someone handed you these log files and asked you to review them for successful VPN
authentication events. Examine the contents of each file using the cat / head commands.
In a log containing VPN authentication events, we would expect to see indicators of success or
failure, as well as usernames, date and times and source IP addresses.
Here are some example commands to show the first 10 lines of each log (The head command
shows the first 10 lines by default).
◇ $ head practice.log.1
5/11
◇ $ head practice.log.2
Based on what you see, which file most likely contains VPN logs?
3. Verify the log's time coverage.
I’d like to tell you that I’ve never analyzed a log file that covered the wrong time period, but I’d
be lying to you.
The last thing you want to do in log analysis is analyze the logs from the wrong date.
The timezone of the date/time stamps in the log may be local or may be UTC, and the first and
last dates and times may or may not overlap with your time frame of interest.
So when you first start looking at a log, it’s worth checking the following:
--Any timezone settings in the log file by using the head command to look for a header
row.
$ head -n 2 practice.log.2
6/11
--The date/time of the earliest record in the log (visible in the command you just ran).
--The date/time of the latest record in the log using the tail command to look at the last
few lines of the file and checking the latest date/time stamp.
$ tail practice.log.2
Based upon the results of those commands, you should be able to answer the following
questions:
> Is a time zone expressed in the header row? If so, what is it?
> What is the earliest date/time in the log?
> What is the latest date/time in the log?
4. Identify the log's size.
Identify the "size" of the log in terms of file size and number of lines using the following
commands.
Check the file size using the ls command with the -lah options.
While won't use this information in this exercise, it could be important if the log is very large and
you were planning on doing something like feeding it into a SIEM or other tool with licensing
limits based on total amount of data ingested or indexed.
$ ls -lah practice.log.2
Check the number of lines in the log using the wc command with the -l option.
The number of lines will tell you whether you could get away with just reading the log manually
versus using more automated processing and search techniques to find what you're looking for.
$ wc -l practice.log.2
7/11
Note: The result of the wc -l command will show you the total number of lines inclusive of
header rows and blank lines, if any.
Try using the wc command to count the number of lines in the other files in the Practice folder.
You will need this information to answer the questions on the Tasks tab.
5.Preview log contents.
Analyzing a log file generally requires some understanding of the contents.
There are several ways to get an idea of what you'll be looking at, but the most direct method is
to just preview few lines in the log and see the content looks like, and check for a header row
that will tell you the names of the fields within.
You already did this with the head command earlier. You could run the same command again,
but without specifying a specific number of lines:
$ head practice.log.2
Alternately, you could use the more or less command in Linux to scroll through the log a bit (by
pressing the space bar after the command) if the log file is longer and you'd like to look through
more than the first few lines.
Use the more command to display practice.log.2.
$ more practice.log.2
8/11
Notice the text ---More--(27%) at the bottom of the screenshot above. That indicates that
there is more content in this file to be displayed, and you can press the space bar to scroll down
to see that additional content.
6.Select an analysis technique likely to resolve your objective.
At this point we should have the correct file, know the time zone assigned to recorded data, and
have a sense of the file size and number of lines.
Given that information, we have to determine the best technique for obtaining the intended
results.
We haven't covered many different log analysis techniques yet, so in this case we're going to
choose "search" as our primary technique.
Searching the file will enable us to quickly find instances of successful logins versus other data in
the log file.
Given the small log size in this example, you could elect to simply read the log manually, but
since we're learning new skills here, we will practice with search.
Part 3: Search a Log File
Now we're going to search our log file in an attempt to fulfill our objectives.
Recall from the previous steps, our objectives are as follows:
Objective A: Identify any "test" user accounts
9/11 that have successfully authenticated via
VPN as can be observed in the appropriate log file(s).
Objective B: Identify the source IP address from which each "test" account was used to
make the VPN connections identified in Objective A.
Objective C: Identify any other user accounts used for VPN authentications with the same
source IP address as the "test" account(s).
1. Select search terms.
Given that our objective is to identify specific, successful authentications, we now need to get a
little more specific and select our search term. When we previewed the log file practice.log.2 in
the previous part, we saw both successful and failed authentication records. Our search criteria
are slightly more specific. Objective A asks us to begin specifically with "test" accounts, however
it does not tell us what those accounts are actually named. A possible first search term could be
the term test (non-case sensitive).
2.Execute search.
From the command line, search for the term "test" (non-case sensitive) in the practice.log.2 file
using the command egrep -i "test" practice.log.2.
Note that the -i option is used, which indicates that the search should be case insensitive meaning it will look for any version of the word "test" regardless of capitalization will return a
result. Give it a try.
$ egrep -i "test" practice.log.2
From your results, you can answer the following questions.
> What is the name or names of the user accounts observed in the results?
◇ vpn_tester
> What is the source IP address or address(es) observed in the results?
◇-.Repeat as necessary to fulfill your objectives.
To fulfill all three objectives, you may need to cycle back and select a new search term and
search again. This is the cyclical or repetitive nature of log analysis where you have to repeat
searches using different criteria to find all of the content that you require to answer the
questions at hand.
Selecting a new search term: In this case, you will need to search again for the source IP
address that you discovered in the search under Step 2 above. The IP address observable in your
previous search results was "-".
Running another search: From the command line run the command egrep "-"
practice.log.2 to find records associated with the IP address that you discovered in your search in
Step 2.
10/11
$ egrep "-" practice.log.2
From your results, consider the following questions.
> Was the IP address- the source of any attempted VPN authentications for different
user names?
◇ yeah
> If so, what were the usernames and were any of those authentication attempts successful?
◇ vpn_tester
◇ micheal.davies
Note: If you are already familiar with regular expressions you may note that a "." (period)
character can be used to represent "any character". We will cover regular expressions in a future
lesson. For now, treating this term as a literal will work with this data set.
Summary
In this lesson, you practiced some basic log analysis functions using the command line.
Be sure to answer the questions on the Tasks tab, then continue to the optional Challenge
Exercise, where you will have the opportunity to profile and search a similar log for similar
criteria.
11/11