btrfs root tree爆炸但backup root还在,应该如何处理?

如题,我的btrfs炸root tree了,但是backup root还在。

基本情况:是个固态改的U盘,前100G是exfat, 剩余约400G是btrfs.

在发现坏掉之前做过如下操作:在一台win10电脑(无winbtrfs)上重新格式化了前面坏掉的exfat, 往前面的exfat里存了些电影,先后插到两台安卓电视,一台安卓手机上看。接着过段时间想用后面的btrfs的时候发现挂不上了……不知道发生了什么,后面的btrfs应该没有碰过才对。

这种情况能否从backup root恢复?另外根据文档,rescue=all理应包含了rescue=usebackuproot, 但这里似乎没生效?

我的tg账号不知道为什么被禁止公开发消息了,只好发在论坛上。

根據這個描述嚴重懷疑前面 exfat 的分區範圍和 btrfs 的分區範圍重疊了,建議做操作前先檢查一下分區表的分區大小和起始位置是否合理。

然後看你的輸出, btrfs check --backup 的時候在 free space tree 發現找不到兩個 block group (或者 free space tree 壞了),分別是 4M 和 8M 大小的,這麼小的 BG 大概是 system bg 。可能是 system bg 在挪位置自動擴大的過程中遇到了啥問題(比如突然斷電)導致了不一致。

確實是這樣,不過內核 mount btrfs 的代碼和 btrfs check (btrfsprogs) 的代碼是兩套,這裏的問題可能不只需要用 backuproot 還需要讀到正確的 system bg 位置。 superblock 中有 system bg 位置的信息(dump-super --full 能看到的那些 sys_chunk_array ), root tree 中有 chunk tree 的位置,從 chunk tree 也能找到 system bg 。我懷疑 mount btrfs 的時候用的 superblock 的,btrfs check 的時候用的 backup root 找到的 root tree 的。

至於修復,既然現在 btrfsprogs 從 backup roots 能讀到,建議先進一步試試 btrfs check --readonly -s 1 和 -s 2 看看倆 superblock 的備份位置能不能正常。如果備份的 superblock 正常的話 btrfs rescue super-recover 用 1 或者 2 號 superblock 修 0 號 superblock 。
如果修 superblock 不解決問題, 先用 btrfs restore 加需要的選項把數據備份到別的盤上,然後試試 btrfs check --repair --backup 看能不能修掉

我翻了下代码,看起来btrfs-progs用的也是kernel代码(kernel-shared),或许逻辑是一样的。不过无论如何,我找到了为什么没有usebackuproot的原因了:

可以看到,指定rescue=all时是没有设置USEBACKUPROOT的。

而使用backup roots slots的重试逻辑在这里:

是test的USEBACKUPROOT, 所以没有调用read_backup_root. 这是bug还是feature还有待研究。

FYI, kernel mount btrfs到open_ctree报错部分的调用链:

  1  1 btrfs_super_ops    66  fs/btrfs/super.c                                                                                                                                                                                          
  2  1 btrfs_fs_type      67  fs/btrfs/super.c                                                                                                                                                                                          
  3  1 btrfs_init_fs_context  2189  fs/btrfs/super.c                                                                                                                                                                                    
  4  1 btrfs_fs_context_ops  2167  fs/btrfs/super.c                                                                                                                                                                                     
  5  1 btrfs_get_tree   2152  fs/btrfs/super.c                                                                                                                                                                                          
  6  1 btrfs_get_tree_super  2105  fs/btrfs/super.c                                                                                                                                                                                     
  7  1 btrfs_fill_super  1904  fs/btrfs/super.c                                                                                                                                                                                         
  8  1 open_ctree        976  fs/btrfs/super.c                                                                                                                                                                                          
  9  1 init_tree_roots  3406  ret = init_tree_roots(fs_info);                                                                                                                                                                           
 10  1 load_important_roots  2658  ret = load_important_roots(fs_info); 
sudo fdisk -l /dev/sda

Disk /dev/sda: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: DISK            
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes
Disklabel type: gpt
Disk identifier: 1F3024F4-FA48-41AE-99B0-0069A99FE531

Device         Start       End   Sectors   Size Type
/dev/sda1       2048 209717247 209715200   100G Microsoft basic data
/dev/sda2  209717248 976773119 767055872 365.8G Linux filesystem

分区是紧邻没有重叠的。

btrfs check --readonly -s {0,1,2}都是同样的报错:

sudo btrfs check --readonly -s 0 /dev/sda2

using SB copy 0, bytenr 65536
Opening filesystem to check...
checksum verify failed on 32276480 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 32276480 wanted 0x9ed7104b found 0x5a4d29a7
checksum verify failed on 32276480 wanted 0x9ed7104b found 0x5a4d29a7
bad tree block 32276480, bytenr mismatch, want=32276480, have=65536
Couldn't read tree root
ERROR: cannot open file system

sudo btrfs check --readonly -s 1 /dev/sda2       
                                                                                                                                                                              
using SB copy 1, bytenr 67108864
Opening filesystem to check...
checksum verify failed on 32276480 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 32276480 wanted 0x9ed7104b found 0x5a4d29a7
checksum verify failed on 32276480 wanted 0x9ed7104b found 0x5a4d29a7
bad tree block 32276480, bytenr mismatch, want=32276480, have=65536
Couldn't read tree root
ERROR: cannot open file system

sudo btrfs check --readonly -s 2 /dev/sda2     
                                                                                                                                                                                
using SB copy 2, bytenr 274877906944
Opening filesystem to check...
checksum verify failed on 32276480 wanted 0x00000000 found 0xb6bde3e4
checksum verify failed on 32276480 wanted 0x9ed7104b found 0x5a4d29a7
checksum verify failed on 32276480 wanted 0x9ed7104b found 0x5a4d29a7
bad tree block 32276480, bytenr mismatch, want=32276480, have=65536
Couldn't read tree root
ERROR: cannot open file system

只有当使用backup roots时,才能open filesystem:

sudo btrfs check --readonly -s 0 --backup /dev/sda2      
                                                                                                                                                                      
using SB copy 0, bytenr 65536
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: d8d3bc1e-23f5-40e3-a059-5d7302c0df51
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
[4/8] checking free space tree
Space key logical 1048576 length 4194304 has no corresponding block group
Space key logical 5242880 length 8388608 has no corresponding block group
[5/8] checking fs roots
[6/8] checking only csums items (without verifying data)
[7/8] checking root refs
[8/8] checking quota groups skipped (not enabled on this FS)
found 360357588992 bytes used, error(s) found
total csum bytes: 206054124
total tree bytes: 517357568
total fs tree bytes: 273727488
total extent tree bytes: 27656192
btree space waste bytes: 50386636
file data blocks allocated: 359840231424
 referenced 368491847680

sudo btrfs check --readonly -s 1 --backup /dev/sda2        
                                                                                                                                                                    
using SB copy 1, bytenr 67108864
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: d8d3bc1e-23f5-40e3-a059-5d7302c0df51
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
[4/8] checking free space tree
Space key logical 1048576 length 4194304 has no corresponding block group
Space key logical 5242880 length 8388608 has no corresponding block group
[5/8] checking fs roots
[6/8] checking only csums items (without verifying data)
[7/8] checking root refs
[8/8] checking quota groups skipped (not enabled on this FS)
found 360357588992 bytes used, error(s) found
total csum bytes: 206054124
total tree bytes: 517357568
total fs tree bytes: 273727488
total extent tree bytes: 27656192
btree space waste bytes: 50386636
file data blocks allocated: 359840231424
 referenced 368491847680

sudo btrfs check --readonly -s 2 --backup /dev/sda2                                                                                                                                                                            

using SB copy 2, bytenr 274877906944
Opening filesystem to check...
Checking filesystem on /dev/sda2
UUID: d8d3bc1e-23f5-40e3-a059-5d7302c0df51
[1/8] checking log skipped (none written)
[2/8] checking root items
[3/8] checking extents
[4/8] checking free space tree
Space key logical 1048576 length 4194304 has no corresponding block group
Space key logical 5242880 length 8388608 has no corresponding block group
[5/8] checking fs roots
[6/8] checking only csums items (without verifying data)
[7/8] checking root refs
[8/8] checking quota groups skipped (not enabled on this FS)
found 360357588992 bytes used, error(s) found
total csum bytes: 206054124
total tree bytes: 517357568
total fs tree bytes: 273727488
total extent tree bytes: 27656192
btree space waste bytes: 50386636
file data blocks allocated: 359840231424
 referenced 368491847680

并且,我将三个位置的superblock dd出来之后diff, 除了0x30处的logical address, 以及0x0处的checksum不同之外,三个superblock是完全一样的。所以不太像是意外覆盖。

这是sudo btrfs inspect-internal dump-super -f /dev/sda2展示的backup roots:

backup_roots[4]:
	backup 0:
		backup_tree_root:	32276480	gen: 5921	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	32194560	gen: 5921	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

	backup 1:
		backup_tree_root:	31424512	gen: 5918	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	31244288	gen: 5918	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

	backup 2:
		backup_tree_root:	31866880	gen: 5919	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	31735808	gen: 5919	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

	backup 3:
		backup_tree_root:	32112640	gen: 5920	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	31981568	gen: 5920	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

使用backup 1的backup_tree_root可以打捞出一些文件:

sudo btrfs --verbose restore --dry-run -t 31424512 -r 256 -d --ignore-errors --path-regex '^/(|path-regex(|/.*))$' /dev/sda2 /tmp/

净损失应该只有少量没来得及备份的文件,所以其实损失不大。(不过我的备份盘全是机械盘btrfs……

我发现mount -o rescue=usebackuproot可以挂上:

Mar 02 19:53:00  kernel: BTRFS: device fsid d8d3bc1e-23f5-40e3-a059-5d7302c0df51 devid 1 transid 5921 /dev/sda2 (8:2) scanned by mount (93222)
Mar 02 19:53:00  kernel: BTRFS info (device sda2): first mount of filesystem d8d3bc1e-23f5-40e3-a059-5d7302c0df51
Mar 02 19:53:00  kernel: BTRFS info (device sda2): using crc32c (crc32c-intel) checksum algorithm
Mar 02 19:53:00  kernel: BTRFS error (device sda2): bad tree block start, mirror 1 want 32276480 have 0
Mar 02 19:53:00  kernel: BTRFS error (device sda2): bad tree block start, mirror 2 want 32276480 have 65536
Mar 02 19:53:00  kernel: BTRFS warning (device sda2): couldn't read tree root
Mar 02 19:53:00  kernel: BTRFS warning (device sda2): try to load backup roots slot 1
Mar 02 19:53:00  kernel: BTRFS info (device sda2): enabling ssd optimizations
Mar 02 19:53:00  kernel: BTRFS info (device sda2): enabling free space tree
Mar 02 19:53:00  kernel: BTRFS info (device sda2): trying to use backup root at mount time

请问这样的话我接下来应该怎么做?仍然check --repair --backup吗?

對比一下 https://github.com/torvalds/linux/blob/master/fs/btrfs/disk-io.c#L3283https://github.com/kdave/btrfs-progs/blob/devel/kernel-shared/disk-io.c#L1741 就會發現兩邊實現偏離挺大了。
btrfs check 實際用的是 open_ctree_fs_info

superblock 裏面記錄的 chunk_root 和 sys_chunk_array 也看一下吧,和 backup roots 的 backup_chunk_root 地址比較一下。

btrfs check --repair --backup 用的 root 是 find_best_backup_root 找的( 代碼 ),既然你現在已經 check 過這些 root 知道哪個 root 看起來比較靠譜了,可以 btrfs check --repair --tree-root 直接指定想要用的 root tree 的地址( 代碼 )。repair 修完之後應該就會用指定的 root 寫入 superblock ,之後應該就能正常 rw 掛載了。

sudo btrfs inspect-internal dump-super -f /dev/sda2

superblock: bytenr=65536, device=/dev/sda2
---------------------------------------------------------
csum_type		0 (crc32c)
csum_size		4
csum			0x9ed7104b [match]
bytenr			65536
flags			0x1
			( WRITTEN )
magic			_BHRfS_M [match]
fsid			d8d3bc1e-23f5-40e3-a059-5d7302c0df51
metadata_uuid		00000000-0000-0000-0000-000000000000
label			
generation		5921
root			32276480
sys_array_size		129
chunk_root_generation	5911
root_level		0
chunk_root		26443776
chunk_root_level	1
log_root		0
log_root_transid (deprecated)	0
log_root_level		0
total_bytes		392732606464
bytes_used		360357588992
sectorsize		4096
nodesize		16384
leafsize (deprecated)	16384
stripesize		4096
root_dir		6
num_devices		1
compat_flags		0x0
compat_ro_flags		0x3
			( FREE_SPACE_TREE |
			  FREE_SPACE_TREE_VALID )
incompat_flags		0x371
			( MIXED_BACKREF |
			  COMPRESS_ZSTD |
			  BIG_METADATA |
			  EXTENDED_IREF |
			  SKINNY_METADATA |
			  NO_HOLES )
cache_generation	0
uuid_tree_generation	5921
dev_item.uuid		594cbcb7-4ab5-47d5-b952-db38b5728c06
dev_item.fsid		d8d3bc1e-23f5-40e3-a059-5d7302c0df51 [match]
dev_item.type		0
dev_item.total_bytes	392732606464
dev_item.bytes_used	392731557888
dev_item.io_align	4096
dev_item.io_width	4096
dev_item.sector_size	4096
dev_item.devid		1
dev_item.dev_group	0
dev_item.seek_speed	0
dev_item.bandwidth	0
dev_item.generation	0
sys_chunk_array[2048]:
	item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096)
		length 8388608 owner 2 stripe_len 65536 type SYSTEM|DUP
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 2 sub_stripes 1
			stripe 0 devid 1 offset 22020096
			dev_uuid 594cbcb7-4ab5-47d5-b952-db38b5728c06
			stripe 1 devid 1 offset 30408704
			dev_uuid 594cbcb7-4ab5-47d5-b952-db38b5728c06
backup_roots[4]:
	backup 0:
		backup_tree_root:	32276480	gen: 5921	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	32194560	gen: 5921	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

	backup 1:
		backup_tree_root:	31424512	gen: 5918	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	31244288	gen: 5918	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

	backup 2:
		backup_tree_root:	31866880	gen: 5919	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	31735808	gen: 5919	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

	backup 3:
		backup_tree_root:	32112640	gen: 5920	level: 0
		backup_chunk_root:	26443776	gen: 5911	level: 1
		backup_extent_root:	31981568	gen: 5920	level: 2
		backup_fs_root:		30408704	gen: 237	level: 0
		backup_dev_root:	31490048	gen: 5918	level: 1
		csum_root:	751534080	gen: 5846	level: 2
		backup_total_bytes:	392732606464
		backup_bytes_used:	360357588992
		backup_num_devices:	1

sys_chunk_array里只有一个item, 其中似乎并没有与backup_chunk_root相等的地址,但是有backup_fs_root的地址30408704。

尝试使用能够restore打捞文件的backup root tree修复:

sudo btrfs check --repair --tree-root 31424512 /dev/sda2
enabling repair mode
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. E.g.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
parent transid verify failed on 31424512 wanted 5921 found 5918
parent transid verify failed on 31424512 wanted 5921 found 5918
parent transid verify failed on 31424512 wanted 5921 found 5918
Ignoring transid failure
Checking filesystem on /dev/sda2
UUID: d8d3bc1e-23f5-40e3-a059-5d7302c0df51
[1/8] checking log skipped (none written)
[2/8] checking root items
Fixed 0 roots.
[3/8] checking extents
No device size related problem found
[4/8] checking free space tree
Space key logical 1048576 length 4194304 has no corresponding block group
deleted orphan fst entries for range [1048576, 5242880)
Space key logical 5242880 length 8388608 has no corresponding block group
deleted orphan fst entries for range [5242880, 13631488)
[5/8] checking fs roots
[6/8] checking only csums items (without verifying data)
[7/8] checking root refs
[8/8] checking quota groups skipped (not enabled on this FS)
found 360357588992 bytes used, no error found
total csum bytes: 206054124
total tree bytes: 517357568
total fs tree bytes: 273727488
total extent tree bytes: 27656192
btree space waste bytes: 50386636
file data blocks allocated: 359840231424
 referenced 368491847680

可以rw mount了,scrub未发现错误,文件系统可以正常读写了

UUID:             d8d3bc1e-23f5-40e3-a059-5d7302c0df51
Scrub started:    Tue Mar  3 11:20:10 2026
Status:           finished
Duration:         0:13:37
Total to scrub:   336.09GiB
Rate:             421.24MiB/s
Error summary:    no errors found

谢谢fc教授 :grinning_face: