Improve logging and context management in the Filesystem source#4809
Improve logging and context management in the Filesystem source#4809
Conversation
| path string, | ||
| ) error { | ||
| ctx = trContext.WithValues(ctx, | ||
| "root_path", rootPath, |
There was a problem hiding this comment.
root_path is already added at line#114
There was a problem hiding this comment.
(This freaked me out a little so I wrote some test code to confirm what I'm about to write)
I was worried that this would be similar to forgetting to shadow the ctx variable in a loop, but it doesn't accumulate in the parent context--the only way that can happen is if you have access to the parent scope, which you don't inside a function/method. So, all that said I like the pattern of adding the arguments up top for trace logging, even though they're sometimes redundant. It's less mental work to think about all the ways we could end up here and thus what might be in the context.
There was a problem hiding this comment.
This is going to emit the key/value pair twice, though. It's genuinely pretty annoying to see in the logs.
There was a problem hiding this comment.
OK I was wrong about this; mea culpa. In my defense I only went 2 (3?) layers deep into logr's zapr code, not 3 (4?) layers deep to zap's code. Oh and look, there's an issue about precisely this, because they're relying on the JSON decoder to dedupe, and Stackdriver's doesn't 🤦🏻.
But, even if Stackdriver did, it's bananas to accumulate these things. They're still there accumulating, even if a decoder overwrites them as it's decoding.
Anyway, alright I'll rejigger. "Wrap zap's JSONEncoder with something that dedupes, even slowly" goes on my "would love to, but will never, fix list".
5c1131e to
84bad12
Compare
21c14e8 to
9fc1b51
Compare
84bad12 to
f0fc20f
Compare
9fc1b51 to
18b3af6
Compare
pkg/sources/filesystem/filesystem.go
Outdated
| resumeAfter := s.GetEncodedResumeInfoFor(resumptionKey) | ||
|
|
||
| ctx = trContext.WithValues(ctx, | ||
| "resume_after", resumeAfter, |
There was a problem hiding this comment.
Isn't this local path information? I think we consider that sensitive.
…gging keys in our logs and avoid logging paths beyond configured root paths
725d8c8 to
6c9c766
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } | ||
|
|
||
| return handlers.HandleFile(fileCtx, inputFile, chunkSkel, sources.ChanReporter{Ch: chunksChan}) | ||
| return handlers.HandleFile(ctx, inputFile, chunkSkel, sources.ChanReporter{Ch: chunksChan}) |
There was a problem hiding this comment.
scanFile lost path from logging context and HandleFile
Medium Severity
scanFile removed the old fileCtx := trContext.WithValues(ctx, "path", path) but didn't replace it, so the file path is no longer in the logging context. Additionally, all callers pass bare ctx (not lctx) to scanFile, so no parent context is inherited either. This means scanFile's own log messages (e.g., "File is binary; skipping") and all downstream logging inside HandleFile (e.g., "empty reader, skipping file", "non-critical error processing chunk") have no file path information, making production debugging very difficult.
Additional Locations (1)
| lctx := trContext.WithValues(ctx, | ||
| "root_path", rootPath, | ||
| "depth", depth, | ||
| ) |
There was a problem hiding this comment.
Entry path lost from scanDir log messages
Medium Severity
scanDir creates lctx with root_path and depth but never includes the path parameter (the directory being scanned) or entryPath (the specific entry being processed). The old code included "path", entryPath inline with each log call. Now, error messages like "error scanning file", "error scanning symlink", and "error scanning directory" provide no indication of which entry failed — only the root path and depth are logged, which is insufficient for debugging in directories with many entries.
Additional Locations (1)
I find this surprising, but maybe I'm confused! What do you mean by this? |
@mcastorina Oh I misremembered the exact opposite thing; I was trying to remember this Scanning Team Hygiene thing. IDC if you think it's more straightforward to use loggers (I... probably agree) then I can do that. |
Okay good! I don't have a preference between contexts / loggers, though the checklist item makes sense to me. I just wanted to make sure our context / log packages aren't giving people footguns. |
rosecodym
left a comment
There was a problem hiding this comment.
is the cursor stuff real?
| func (s *Source) Chunks(ctx trContext.Context, chunksChan chan *sources.Chunk, _ ...sources.ChunkingTarget) error { | ||
| for i, rootPath := range s.paths { | ||
| logger := ctx.Logger().WithValues("path", rootPath) | ||
| lctx := trContext.WithValues(ctx, |
There was a problem hiding this comment.
Any reason to not shadow ctx? We do it elsewhere and it would reduce diff noise.


Description:
When trying to tidy up logging and context management in the Filesystem source, we realized that we'd often be logging messages with duplicate keys (this is an issue when using Uber's zap logger w/ Stackdriver) and that in some cases we'd log the paths of things we scan, which we'd rather not do. So, beyond unifying the patterns, using common names for everything, making logging levels for different situations consistent (V(2) for skipping, V(5) for tracing, etc), and adding trace logging, this avoids logging paths of things we scan and introduces a new pattern for avoiding duplicate keys:
lctx(local context). In lieu of pulling the logger out of a context (which we avoid because modifying them modifies them for everyone, which quickly becomes confusing) this creates local contexts that are never passed to other functions/methods. I just made this up, feel free to thumbs down. But this is blocking some stuff, so if we don't like it I'm just gonna pull this PR entirely.