csgrep: --add-input-lines option for --mode=json#242
csgrep: --add-input-lines option for --mode=json#242Jany26 wants to merge 1 commit intocsutils:mainfrom
Conversation
If --add-input-lines option is used, DefEvents output as json will have "input_line" referencing the line number of the input file. Signed-off-by: Jan Matufka <jmatufka@redhat.com>
| std::string fileName; | ||
| int line = 0; | ||
| int column = 0; | ||
| int inputLine = 0; |
There was a problem hiding this comment.
We should also (optionally) record inputFile. Otherwise the number is ambiguous in case csgrep reads multiple input files.
| class ErrFileLexer { | ||
| public: | ||
| ErrFileLexer(std::istream &input): | ||
| ErrFileLexer(std::istream &input, bool addInputLines = false): |
There was a problem hiding this comment.
It would be easier if the ErrFileLexer constructor took InStream &. This class is internal to the parser-cov module, so the constructor can be changed easily.
| class Tokenizer: public ITokenizer { | ||
| public: | ||
| Tokenizer(std::istream &input): | ||
| Tokenizer(std::istream &input, bool addInputLines = false): |
There was a problem hiding this comment.
Same here, the Tokenizer constructor can take InStream & to simplify the code.
|
|
||
| // known per-event subnodes | ||
| // 'input_line' is whitelisted, but currently there is no use-case for it | ||
| // so it is not re-read |
There was a problem hiding this comment.
Why cannot we preserve non-zero input_line values already recorded in JSON files?
The use-case would be:
csgrep --mode=json file1.json file2.json > all.json
| if (0 < evt.vSize) | ||
| evtNode["v_size"] = evt.vSize; | ||
| if (0 < evt.inputLine) | ||
| evtNode["input_line"] = evt.inputLine; |
There was a problem hiding this comment.
We need to record inputFile as well. Otherwise the number is meaningless when multiple text files are read.
| ("file-glob", "expand glob patterns in the names of input files") | ||
| ("ignore-case,i", "ignore case when matching regular expressions") | ||
| ("ignore-parser-warnings", "if enabled, parser warnings about the input files do not affect exit code") | ||
| ("add-input-lines", "if enabled, events in mode=json will also contain input_line numbers from the original input file/stream") |
There was a problem hiding this comment.
The --add-input-lines option name does not sound intuitive for what it does. Without the context I would understand it as if some input text was being added rather than line numbers. As mentioned above, we should record the file names, too. What about naming it --record-input-locations instead?
There was a problem hiding this comment.
Should I also rename the JSON fields to input_line_location and input_file_location?
There was a problem hiding this comment.
I would use input_file and input_line. I see location as an abstraction over file, line, column. Something like: https://github.com/gcc-mirror/gcc/blob/5cd3889135d77bf951e4ffe169868b453c36257d/libcpp/include/line-map.h#L1293
|
ad pending s390x builds: fedora-copr/copr#4219 (comment) |
I think we can simply disable the |
This PR adds a new CLI option
--add-input-linesthat provides an extrainput_linefield in DefEvents in the JSON ouptut. The tokenizer captures the line number but only emits it during JSON writing. Change is backwards compatible (csgrep/0134-0135 tests should cover that). If needed, I can add additional tests, but the line number information is not utilized anywhere else.Context for the proposed change (our use case).
In Log Detective, we utilize csgrep to extract compiler errors from failed RPM log files to provide additional context for LLM analysis. The extracted log snippets should have a line number pointing to the log file for easy lookup. Currently there is no easy way for us to extract input line numbers from CSGrep output.