Extract values from a log file

In this article I will explain how you can use bash and all the other cool unix tool to extract data from other files like a log file or something similar. First of all a quick example how the data in our case looks like. I created it with generatedata. In the first column we have some iterations, in the second column some normal distributed data and in the third column also some data.

Log Data

Data 1|Data 2|Data 3
Iter 1|9.57008|16460317 1044
Iter 2|9.988|16770723 8791
Iter 3|9.97427|16331126 4422
Iter 4|9.97884|16630208 9401
Iter 5|10.65432|16090822 8083
Iter 6|10.28093|16550713 8674
Iter 7|10.06096|16050104 8912
Iter 8|10.14419|16391021 6534
Iter 9|10.25904|16780715 3957
Iter 10|9.75479|16970729 1945

Tools

To extract data from such kind of files and also do some manipulation, you need the following tools:

  • awk
  • bc
  • the pipe operator (more a syntax tool 🤓)

You can also think of this list as a pipeline. awk -> bc and all connected by the pipe operator. With awk you simply search for the lines in your files your interested and it also allows you to extract certain parts of the latter data. With bc you can perform calculations.

Example

I already created a simple bash file to explain you the commands with an example.

#!/usr/bin/env bash
log_data_1=$(awk '/Iter 8/{ split($0,data,"|"); print data[2] }' log_data.txt)
echo "${log_data_1}"
log_data_2=$(awk '/Iter 10/{ split($0,data,"|"); print data[2] }' log_data.txt)
echo "${log_data_2}"
metric=$(echo "scale=4; $log_data_1/$log_data_2" | bc);
echo "${metric}"
echo "Writing Output to File" 
echo "Results" >> results.txt
echo "${log_data_1};${log_data_2};${metric}" >> results.txt

The goal of this bash script is to extract the data of the 8th and 10th iteration and perform one metric calculation, in the hpc world this might be a scaling performance. The first command is

log_data_1=$(awk '/Iter 8/{ split($0,data,"|"); print data[2] }' log_data.txt)

Let’s break this down, so the outer part $() is just for assigning the results of the command that is executed within the parentheses, to a given variable so in this case log_data_1. The actual command starts with awk. The slashes frame the keyword we are searching for, so Iter 8. This data is obviously in our log file. In the next step we have to split our data, as it is just a big string without any information. This is done with split($0,data,"|"). $0 is our search result (Iter 8|10.14419|16391021 6534). data is the array we store our data after the splitting and "|" our delimiter. With print data[2] we grab the second element of the data element. log_data.txt is the file we are searching fore. With log_data_2=$(awk '/Iter 10/{ split($0,data,"|"); print data[2] }' log_data.txt) we perform exactly the same but with the 10th iteration.

In the next step one would like to calculate some metric. Bash is not capable of performing floating point operations but therefore you can use the tool bc 💪. With scale=4 you define the level of precision you would like to have. With

$log_data_1 / $log_data_2

you define the calculation and with | you pipe this command into bc which then returns the result. Finally we would like to store our results in a separate file for some visulaztion therefore we just echo our variables into a new files results.txt. This last step finishes the bash script. The output you will see in the terminal is the following.

10.14419
9.75479
1.0399
Writing Output to File

and the output File looks as follows:

Results
10.14419;9.75479;1.0399