-
Notifications
You must be signed in to change notification settings - Fork 325
Description
Add troubleshooting guidance for "shard group has no shards" write failures
Affected pages
InfluxDB v2:
- https://docs.influxdata.com/influxdb/v2/write-data/troubleshoot/
- https://docs.influxdata.com/influxdb/v2/reference/internals/shards/
- https://docs.influxdata.com/influxdb/v2/admin/backup-restore/restore/
InfluxDB v1 OSS:
- https://docs.influxdata.com/influxdb/v1/administration/backup_and_restore/
- https://docs.influxdata.com/influxdb/v1/guides/write_data/
InfluxDB v1 Enterprise:
Change
Write troubleshooting (v2) — add a new section:
Currently, there is no documentation for the error shard group N covering <start> to <end> has no shards. Add a troubleshooting entry covering:
- Error message:
failure writing points to database: shard group N covering <start> UTC to <end> UTC has no shards - Cause: A shard group exists in the metadata (BoltDB) but contains zero shards. Reports are associated with changes in write precision or backup/restore operations.
- Affected versions: In versions prior to 1.12 and 2.7.12, this condition causes a panic (divide by zero). In 1.12+ and 2.7.12+, the server returns an error instead of panicking.
- Behavior: Writes to the affected time range fail until the shard group duration expires and a new shard group is created for the next time window. Writes to other time ranges are unaffected.
- Workaround: Wait for the affected shard group's time range to pass, or copy data from the affected bucket into a new bucket.
- Tracking: GitHub issue #25715
Shards internals page (v2) — add a note under "Shard groups":
Add a brief note that a shard group with zero shards is an abnormal state that causes write failures for that time range. Link to the troubleshooting entry.
Restore pages (v2, v1, enterprise) — add a caution:
Under the restore instructions, add a note that in rare cases, backup/restore operations may produce shard groups with zero shards, which causes write failures. Link to the troubleshooting entry (or describe inline for v1 pages).
v1 write_data page — add a troubleshooting note:
Document the error message and link to the relevant GitHub issue for v1 users encountering this after backup/restore.
Constraints
- Applies to InfluxDB v1 (1.x) and InfluxDB v2 (2.x) — both use the TSM storage engine with shard groups
- Does not apply to InfluxDB 3 (different storage engine, no shard groups)
- Do not state a definitive root cause. Use language like "reports are associated with" rather than "caused by"
- Internal metadata repair tools exist but are not shipped and should not be documented
Source analysis
- PR #25717 (merged Dec 2024) prevents the divide-by-zero panic when a shard group has no shards. Ported to 2.7 branch via #26389 and to 1.x via #25718.
- Issue #25715 tracks the root cause investigation.
- PR #27035 (open) adds logging to detect null-shard groups during backup/restore.
- Engineering confirmed: fix is in 1.12+ and 2.7.12+. Writes fail until shard group duration expires.
Source: DAR issue (internal), source code analysis