大家好,我用archlinux+kde,最近没有进行更新,今天系统启动后我输入密码发现登不进去,强制重启后进去了(应该是我开机后没马上登录系统直接崩溃了),刚进去一切正常,不到两分钟就黑屏了,鼠标可移动,我进入tty显示btrfs error,重复开机多次都是同样的情况。
我是外接USB,致钛的硬盘,以前内置硬盘也会出现随机卡死的情况,后来外接后就OK了,问题是这频率很奇怪,开机后服务马上就崩溃了
大家好,我用archlinux+kde,最近没有进行更新,今天系统启动后我输入密码发现登不进去,强制重启后进去了(应该是我开机后没马上登录系统直接崩溃了),刚进去一切正常,不到两分钟就黑屏了,鼠标可移动,我进入tty显示btrfs error,重复开机多次都是同样的情况。
我是外接USB,致钛的硬盘,以前内置硬盘也会出现随机卡死的情况,后来外接后就OK了,问题是这频率很奇怪,开机后服务马上就崩溃了
看看smart info?
sudo smartctl -a /dev/sda
报错的是 sda3 呀。
买了个同样大小的硬盘,想着dd复制过来,错误太多,复制完也用不了。然后新装了个系统,大佬有什么建议吗
你什么硬盘,大概率硬盘问题,我上一个是致太tiplus5000,如果你有别的硬盘可以迁移一下(现在硬盘确实很贵
我有一块硬盘也是TiPlus5000,可惜内核日志似乎没有记录到有用的信息,我电脑的Btrfs系统有两块硬盘,我不确切是哪块硬盘的问题。
下面是两块硬盘的检测信息,似乎都是正常状态
$ sudo smartctl -a /dev/nvme1n1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.58-1-lts] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: ZHITAI TiPlus5000 1TB
Serial Number: ZTA21T0KA232760F6L
Firmware Version: ZTA10613
PCI Vendor/Subsystem ID: 0x1e49
IEEE OUI Identifier: 0xa428b7
Total NVM Capacity: 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity: 0
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: a428b7 02f6880082
Local Time is: Sat Nov 29 20:15:53 2025 CST
Firmware Updates (0x1a): 5 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.50W - - 0 0 0 0 0 0
1 + 5.80W - - 1 1 1 1 0 0
2 + 3.60W - - 2 2 2 2 0 0
3 - 0.0500W - - 3 3 3 3 5000 10000
4 - 0.0025W - - 4 4 4 4 8000 45000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 43 Celsius
Available Spare: 100%
Available Spare Threshold: 1%
Percentage Used: 3%
Data Units Read: 50,138,131 [25.6 TB]
Data Units Written: 37,850,118 [19.3 TB]
Host Read Commands: 669,059,704
Host Write Commands: 842,293,133
Controller Busy Time: 1,503
Power Cycles: 2,955
Power On Hours: 10,815
Unsafe Shutdowns: 274
Media and Data Integrity Errors: 0
Error Information Log Entries: 4
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 43 Celsius
Temperature Sensor 2: 43 Celsius
Thermal Temp. 1 Transition Count: 10
Thermal Temp. 1 Total Time: 14
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged
$ sudo smartctl -a /dev/nvme0n1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.58-1-lts] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SKHynix_HFS001TEJ4X112N
Serial Number: 4YD1N045710901O10
Firmware Version: 51040C31
PCI Vendor/Subsystem ID: 0x1c5c
IEEE OUI Identifier: 0xace42e
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: ace42e 004a16cb0f
Local Time is: Sat Nov 29 20:14:51 2025 CST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 86 Celsius
Critical Comp. Temp. Threshold: 87 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 4.5000W - - 0 0 0 0 100 100
1 + 3.0000W - - 1 1 1 1 200 200
2 + 0.6000W - - 2 2 2 2 400 400
3 - 0.0150W - - 3 3 3 3 2000 2000
4 - 0.0030W - - 4 4 4 4 5000 10000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 33 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 2%
Data Units Read: 63,388,720 [32.4 TB]
Data Units Written: 83,775,476 [42.8 TB]
Host Read Commands: 2,198,068,774
Host Write Commands: 2,545,925,945
Controller Busy Time: 35,460
Power Cycles: 2,509
Power On Hours: 7,040
Unsafe Shutdowns: 40
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 37 Celsius
Temperature Sensor 2: 33 Celsius
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged
你看致太这个盘
Unsafe Shutdowns: 274
基本没跑了。我这个盘一开始随机重启或卡死,和arch论坛国外一网友讨论,他说不放启动盘就OK,我后来外接硬盘也正常了。9月份盘坏了,几分钟就卡死,稳定复现,找售后换了新的。
喔,274次不正常关机(之前出现过电脑突然无响应的情况,等了一会没恢复直接强制关机了)
关于TiPlus5000这块硬盘:之前这个盘装在了另一个电脑上(分了两个区,一个挂载到 /home),出现过一次比较严重的问题差点把我数据弄没了,如果真是硬盘问题那真的太不幸了 ![]()
回到 Btrfs error 错误
$ sudo btrfs filesystem usage /
Overall:
Device size: 468.00GiB
Device allocated: 468.00GiB
Device unallocated: 2.00MiB
Device missing: 0.00B
Device slack: 0.00B
Used: 369.53GiB
Free (estimated): 96.59GiB (min: 96.59GiB)
Free (statfs, df): 96.59GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:444.97GiB, Used:348.38GiB (78.29%)
/dev/nvme1n1p6 200.97GiB
/dev/nvme0n1p1 244.00GiB
Metadata,DUP: Size:11.48GiB, Used:10.58GiB (92.11%)
/dev/nvme1n1p6 10.96GiB
/dev/nvme0n1p1 12.00GiB
System,DUP: Size:32.00MiB, Used:80.00KiB (0.24%)
/dev/nvme1n1p6 64.00MiB
Unallocated:
/dev/nvme1n1p6 1.00MiB
/dev/nvme0n1p1 1.00MiB
Device unallocated大小只有2MB了,加上Metadata已经占用高达92%,查到了这两篇博客(依云大佬
) btrfs 元数据满了怎么办和 btrfs 翻车记
大概率确切了是没有可分配空间给Metadata触发了错误,同时前不久我从Timeshift换成了Snapper,Snapper没有配置自动删除快照(我一看快照快100个了),并且发生的这两次错误都是在换成Snapper之后发生的
,正好是在整点创建快照时发生的错误
后面我清理了快照和一些垃圾,使用一段时间看看还会不会出现问题
$ sudo btrfs filesystem usage /
Overall:
Device size: 468.00GiB
Device allocated: 461.00GiB
Device unallocated: 7.00GiB
Device missing: 0.00B
Device slack: 0.00B
Used: 274.86GiB
Free (estimated): 186.80GiB (min: 183.30GiB)
Free (statfs, df): 186.80GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:437.97GiB, Used:258.17GiB (58.95%)
/dev/nvme0n1p6 198.97GiB
/dev/nvme1n1p1 239.00GiB
Metadata,DUP: Size:11.48GiB, Used:8.34GiB (72.66%)
/dev/nvme0n1p6 10.96GiB
/dev/nvme1n1p1 12.00GiB
System,DUP: Size:32.00MiB, Used:80.00KiB (0.24%)
/dev/nvme0n1p6 64.00MiB
Unallocated:
/dev/nvme0n1p6 2.00GiB
/dev/nvme1n1p1 5.00GiB
好
不安全关机数不一定是有问题, 有的盘睡眠休眠也会导致这个数据增加, 我一块RC20已经2000+了, 就是不关机只睡眠导致的