Skip to content

Add troubleshooting guidance for 'shard group has no shards' write failures #6913

@jstirnaman

Description

@jstirnaman

Add troubleshooting guidance for "shard group has no shards" write failures

Affected pages

InfluxDB v2:

InfluxDB v1 OSS:

InfluxDB v1 Enterprise:

Change

Write troubleshooting (v2) — add a new section:

Currently, there is no documentation for the error shard group N covering <start> to <end> has no shards. Add a troubleshooting entry covering:

  • Error message: failure writing points to database: shard group N covering <start> UTC to <end> UTC has no shards
  • Cause: A shard group exists in the metadata (BoltDB) but contains zero shards. Reports are associated with changes in write precision or backup/restore operations.
  • Affected versions: In versions prior to 1.12 and 2.7.12, this condition causes a panic (divide by zero). In 1.12+ and 2.7.12+, the server returns an error instead of panicking.
  • Behavior: Writes to the affected time range fail until the shard group duration expires and a new shard group is created for the next time window. Writes to other time ranges are unaffected.
  • Workaround: Wait for the affected shard group's time range to pass, or copy data from the affected bucket into a new bucket.
  • Tracking: GitHub issue #25715

Shards internals page (v2) — add a note under "Shard groups":

Add a brief note that a shard group with zero shards is an abnormal state that causes write failures for that time range. Link to the troubleshooting entry.

Restore pages (v2, v1, enterprise) — add a caution:

Under the restore instructions, add a note that in rare cases, backup/restore operations may produce shard groups with zero shards, which causes write failures. Link to the troubleshooting entry (or describe inline for v1 pages).

v1 write_data page — add a troubleshooting note:

Document the error message and link to the relevant GitHub issue for v1 users encountering this after backup/restore.

Constraints

  • Applies to InfluxDB v1 (1.x) and InfluxDB v2 (2.x) — both use the TSM storage engine with shard groups
  • Does not apply to InfluxDB 3 (different storage engine, no shard groups)
  • Do not state a definitive root cause. Use language like "reports are associated with" rather than "caused by"
  • Internal metadata repair tools exist but are not shipped and should not be documented

Source analysis

  • PR #25717 (merged Dec 2024) prevents the divide-by-zero panic when a shard group has no shards. Ported to 2.7 branch via #26389 and to 1.x via #25718.
  • Issue #25715 tracks the root cause investigation.
  • PR #27035 (open) adds logging to detect null-shard groups during backup/restore.
  • Engineering confirmed: fix is in 1.12+ and 2.7.12+. Writes fail until shard group duration expires.

Source: DAR issue (internal), source code analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions