Skip to content

Improve logging and context management in the Filesystem source#4809

Open
camgunz wants to merge 7 commits intomainfrom
fs-source-improve-logging-and-context-management
Open

Improve logging and context management in the Filesystem source#4809
camgunz wants to merge 7 commits intomainfrom
fs-source-improve-logging-and-context-management

Conversation

@camgunz
Copy link
Contributor

@camgunz camgunz commented Mar 10, 2026

Description:

When trying to tidy up logging and context management in the Filesystem source, we realized that we'd often be logging messages with duplicate keys (this is an issue when using Uber's zap logger w/ Stackdriver) and that in some cases we'd log the paths of things we scan, which we'd rather not do. So, beyond unifying the patterns, using common names for everything, making logging levels for different situations consistent (V(2) for skipping, V(5) for tracing, etc), and adding trace logging, this avoids logging paths of things we scan and introduces a new pattern for avoiding duplicate keys: lctx (local context). In lieu of pulling the logger out of a context (which we avoid because modifying them modifies them for everyone, which quickly becomes confusing) this creates local contexts that are never passed to other functions/methods. I just made this up, feel free to thumbs down. But this is blocking some stuff, so if we don't like it I'm just gonna pull this PR entirely.

@camgunz camgunz requested a review from a team as a code owner March 10, 2026 17:07
@camgunz camgunz changed the title Fs source improve logging and context management Improve logging and context management in the Filesystem source Mar 10, 2026
path string,
) error {
ctx = trContext.WithValues(ctx,
"root_path", rootPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

root_path is already added at line#114

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This freaked me out a little so I wrote some test code to confirm what I'm about to write)

I was worried that this would be similar to forgetting to shadow the ctx variable in a loop, but it doesn't accumulate in the parent context--the only way that can happen is if you have access to the parent scope, which you don't inside a function/method. So, all that said I like the pattern of adding the arguments up top for trace logging, even though they're sometimes redundant. It's less mental work to think about all the ways we could end up here and thus what might be in the context.

Copy link
Contributor

@rosecodym rosecodym Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to emit the key/value pair twice, though. It's genuinely pretty annoying to see in the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I was wrong about this; mea culpa. In my defense I only went 2 (3?) layers deep into logr's zapr code, not 3 (4?) layers deep to zap's code. Oh and look, there's an issue about precisely this, because they're relying on the JSON decoder to dedupe, and Stackdriver's doesn't 🤦🏻.

But, even if Stackdriver did, it's bananas to accumulate these things. They're still there accumulating, even if a decoder overwrites them as it's decoding.

Anyway, alright I'll rejigger. "Wrap zap's JSONEncoder with something that dedupes, even slowly" goes on my "would love to, but will never, fix list".

@camgunz camgunz force-pushed the fs-source-simplify-workerpools branch from 5c1131e to 84bad12 Compare March 11, 2026 10:19
@camgunz camgunz force-pushed the fs-source-improve-logging-and-context-management branch from 21c14e8 to 9fc1b51 Compare March 11, 2026 10:21
@camgunz camgunz force-pushed the fs-source-simplify-workerpools branch from 84bad12 to f0fc20f Compare March 11, 2026 10:28
@camgunz camgunz force-pushed the fs-source-improve-logging-and-context-management branch from 9fc1b51 to 18b3af6 Compare March 11, 2026 10:33
resumeAfter := s.GetEncodedResumeInfoFor(resumptionKey)

ctx = trContext.WithValues(ctx,
"resume_after", resumeAfter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this local path information? I think we consider that sensitive.

Base automatically changed from fs-source-simplify-workerpools to fs-source-confine-symlink-state-to-scansymlink March 17, 2026 10:12
Base automatically changed from fs-source-confine-symlink-state-to-scansymlink to main March 17, 2026 10:13
@camgunz camgunz requested a review from a team March 17, 2026 10:13
@camgunz camgunz force-pushed the fs-source-improve-logging-and-context-management branch from 725d8c8 to 6c9c766 Compare March 17, 2026 13:34
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}

return handlers.HandleFile(fileCtx, inputFile, chunkSkel, sources.ChanReporter{Ch: chunksChan})
return handlers.HandleFile(ctx, inputFile, chunkSkel, sources.ChanReporter{Ch: chunksChan})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scanFile lost path from logging context and HandleFile

Medium Severity

scanFile removed the old fileCtx := trContext.WithValues(ctx, "path", path) but didn't replace it, so the file path is no longer in the logging context. Additionally, all callers pass bare ctx (not lctx) to scanFile, so no parent context is inherited either. This means scanFile's own log messages (e.g., "File is binary; skipping") and all downstream logging inside HandleFile (e.g., "empty reader, skipping file", "non-critical error processing chunk") have no file path information, making production debugging very difficult.

Additional Locations (1)
Fix in Cursor Fix in Web

lctx := trContext.WithValues(ctx,
"root_path", rootPath,
"depth", depth,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entry path lost from scanDir log messages

Medium Severity

scanDir creates lctx with root_path and depth but never includes the path parameter (the directory being scanned) or entryPath (the specific entry being processed). The old code included "path", entryPath inline with each log call. Now, error messages like "error scanning file", "error scanning symlink", and "error scanning directory" provide no indication of which entry failed — only the root path and depth are logged, which is insufficient for debugging in directories with many entries.

Additional Locations (1)
Fix in Cursor Fix in Web

@mcastorina
Copy link
Contributor

In lieu of pulling the logger out of a context (which we avoid because modifying them modifies them for everyone, which quickly becomes confusing)

I find this surprising, but maybe I'm confused! What do you mean by this?

@camgunz
Copy link
Contributor Author

camgunz commented Mar 17, 2026

In lieu of pulling the logger out of a context (which we avoid because modifying them modifies them for everyone, which quickly becomes confusing)

I find this surprising, but maybe I'm confused! What do you mean by this?

@mcastorina Oh I misremembered the exact opposite thing; I was trying to remember this Scanning Team Hygiene thing. IDC if you think it's more straightforward to use loggers (I... probably agree) then I can do that.

@mcastorina
Copy link
Contributor

In lieu of pulling the logger out of a context (which we avoid because modifying them modifies them for everyone, which quickly becomes confusing)

I find this surprising, but maybe I'm confused! What do you mean by this?

@mcastorina Oh I misremembered the exact opposite thing; I was trying to remember this Scanning Team Hygiene thing. IDC if you think it's more straightforward to use loggers (I... probably agree) then I can do that.

Okay good! I don't have a preference between contexts / loggers, though the checklist item makes sense to me. I just wanted to make sure our context / log packages aren't giving people footguns.

Copy link
Contributor

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the cursor stuff real?

func (s *Source) Chunks(ctx trContext.Context, chunksChan chan *sources.Chunk, _ ...sources.ChunkingTarget) error {
for i, rootPath := range s.paths {
logger := ctx.Logger().WithValues("path", rootPath)
lctx := trContext.WithValues(ctx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not shadow ctx? We do it elsewhere and it would reduce diff noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants