Our company uses multiple log file formats.
We would like to develop a series of tools to parse them, often the same core functionality for multiple log file formats.
A classical example is generating Message Sequence Charts from the log files (other candidates are memory usage, stack size and time measurement between events).
To stick with Message Sequence Charts, we basically need to identify:
- the title of the columns (i.e processes involved)
- the title of the messages
- perhaps also message parameters
These, we intend to determine by parsing the log files.
However, for different formats, we might say that the information is
- delimited by certain strings (
<some text> <process name> SEND <message> TO <process name>
) - in fixed column positions
- in the Nth word of lines beginning with a certain string
- and so on
This could turn into a spaghetti of command line arguments, so we are thinking of an external configuration file.
What sort of file (plain text, .INI, XMl, or what) and how should we structure it?
I would hope that about 90% of the core of our scripts is common, and we just need 10% or so to massage the input to direct our parsing.
Any advice, or references?