csgrep: --add-input-lines option for --mode=json by Jany26 · Pull Request #242 · csutils/csdiff

Jany26 · 2026-03-24T11:04:38Z

This PR adds a new CLI option --add-input-lines that provides an extra input_line field in DefEvents in the JSON ouptut. The tokenizer captures the line number but only emits it during JSON writing. Change is backwards compatible (csgrep/0134-0135 tests should cover that). If needed, I can add additional tests, but the line number information is not utilized anywhere else.

Context for the proposed change (our use case).
In Log Detective, we utilize csgrep to extract compiler errors from failed RPM log files to provide additional context for LLM analysis. The extracted log snippets should have a line number pointing to the log file for easy lookup. Currently there is no easy way for us to extract input line numbers from CSGrep output.

If --add-input-lines option is used, DefEvents output as json will have "input_line" referencing the line number of the input file. Signed-off-by: Jan Matufka <jmatufka@redhat.com>

kdudka

@Jany26 I like the proposed feature. A few technical comments inline...

kdudka · 2026-03-24T13:22:37Z

src/lib/defect.hh

    std::string         fileName;
    int                 line            = 0;
    int                 column          = 0;
+    int                 inputLine       = 0;


We should also (optionally) record inputFile. Otherwise the number is ambiguous in case csgrep reads multiple input files.

kdudka · 2026-03-24T13:33:50Z

src/lib/parser-cov.cc

 class ErrFileLexer {
    public:
-        ErrFileLexer(std::istream &input):
+        ErrFileLexer(std::istream &input, bool addInputLines = false):


It would be easier if the ErrFileLexer constructor took InStream &. This class is internal to the parser-cov module, so the constructor can be changed easily.

kdudka · 2026-03-24T13:35:22Z

src/lib/parser-gcc.cc

 class Tokenizer: public ITokenizer {
    public:
-        Tokenizer(std::istream &input):
+        Tokenizer(std::istream &input, bool addInputLines = false):


Same here, the Tokenizer constructor can take InStream & to simplify the code.

kdudka · 2026-03-24T13:37:54Z

src/lib/parser-json-simple.cc


    // known per-event subnodes
+    // 'input_line' is whitelisted, but currently there is no use-case for it
+    // so it is not re-read


Why cannot we preserve non-zero input_line values already recorded in JSON files?
The use-case would be:

csgrep --mode=json file1.json file2.json > all.json

kdudka · 2026-03-24T13:39:15Z

src/lib/writer-json-simple.cc

        if (0 < evt.vSize)
            evtNode["v_size"] = evt.vSize;
+        if (0 < evt.inputLine)
+            evtNode["input_line"] = evt.inputLine;


We need to record inputFile as well. Otherwise the number is meaningless when multiple text files are read.

kdudka · 2026-03-24T13:42:45Z

src/csgrep.cc

            ("file-glob",                                       "expand glob patterns in the names of input files")
            ("ignore-case,i",                                   "ignore case when matching regular expressions")
            ("ignore-parser-warnings",                          "if enabled, parser warnings about the input files do not affect exit code")
+            ("add-input-lines",                                 "if enabled, events in mode=json will also contain input_line numbers from the original input file/stream")


The --add-input-lines option name does not sound intuitive for what it does. Without the context I would understand it as if some input text was being added rather than line numbers. As mentioned above, we should record the file names, too. What about naming it --record-input-locations instead?

Should I also rename the JSON fields to input_line_location and input_file_location?

I would use input_file and input_line. I see location as an abstraction over file, line, column. Something like: https://github.com/gcc-mirror/gcc/blob/5cd3889135d77bf951e4ffe169868b453c36257d/libcpp/include/line-map.h#L1293

Jany26 · 2026-03-24T13:47:47Z

ad pending s390x builds: fedora-copr/copr#4219 (comment)

kdudka · 2026-03-24T19:50:59Z

ad pending s390x builds: fedora-copr/copr#4219 (comment)

I think we can simply disable the s390x CI jobs. I do not remember they would ever catch a bug that the other jobs missed. I already did this for csmock: csutils/csmock@62de9a5

csgrep: --add-input-lines option for --mode=json

5122daf

If --add-input-lines option is used, DefEvents output as json will have "input_line" referencing the line number of the input file. Signed-off-by: Jan Matufka <jmatufka@redhat.com>

kdudka self-assigned this Mar 24, 2026

kdudka self-requested a review March 24, 2026 11:20

kdudka requested changes Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csgrep: --add-input-lines option for --mode=json#242

csgrep: --add-input-lines option for --mode=json#242
Jany26 wants to merge 1 commit intocsutils:mainfrom
Jany26:input-lineno-in-json

Jany26 commented Mar 24, 2026

Uh oh!

kdudka left a comment

Uh oh!

kdudka Mar 24, 2026

Uh oh!

kdudka Mar 24, 2026

Uh oh!

kdudka Mar 24, 2026

Uh oh!

kdudka Mar 24, 2026

Uh oh!

kdudka Mar 24, 2026

Uh oh!

kdudka Mar 24, 2026

Uh oh!

Jany26 Mar 24, 2026

Uh oh!

kdudka Mar 24, 2026

Uh oh!

Jany26 commented Mar 24, 2026

Uh oh!

kdudka commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jany26 commented Mar 24, 2026

Uh oh!

kdudka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jany26 commented Mar 24, 2026

Uh oh!

kdudka commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants