系统开机不久会出现黑屏卡死情况

大家好,我用archlinux+kde,最近没有进行更新,今天系统启动后我输入密码发现登不进去,强制重启后进去了(应该是我开机后没马上登录系统直接崩溃了),刚进去一切正常,不到两分钟就黑屏了,鼠标可移动,我进入tty显示btrfs error,重复开机多次都是同样的情况。

我是外接USB,致钛的硬盘,以前内置硬盘也会出现随机卡死的情况,后来外接后就OK了,问题是这频率很奇怪,开机后服务马上就崩溃了

看看smart info?

sudo smartctl -a /dev/sda

不知为何连不上网了,只能拍照了,抱歉,timeshift加载不了,直接卡死。

报错的是 sda3 呀。

买了个同样大小的硬盘,想着dd复制过来,错误太多,复制完也用不了。然后新装了个系统,大佬有什么建议吗

sda3

我出现过两次跟你类似的情况,两次都不是使用过程中突然蹦出来的,而像是系统息屏后挂起发生的,你找到原因了吗?

你什么硬盘,大概率硬盘问题,我上一个是致太tiplus5000,如果你有别的硬盘可以迁移一下(现在硬盘确实很贵

我有一块硬盘也是TiPlus5000,可惜内核日志似乎没有记录到有用的信息,我电脑的Btrfs系统有两块硬盘,我不确切是哪块硬盘的问题。
下面是两块硬盘的检测信息,似乎都是正常状态

$ sudo smartctl -a /dev/nvme1n1 
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.58-1-lts] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       ZHITAI TiPlus5000 1TB
Serial Number:                      ZTA21T0KA232760F6L
Firmware Version:                   ZTA10613
PCI Vendor/Subsystem ID:            0x1e49
IEEE OUI Identifier:                0xa428b7
Total NVM Capacity:                 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            a428b7 02f6880082
Local Time is:                      Sat Nov 29 20:15:53 2025 CST
Firmware Updates (0x1a):            5 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    3  3  3  3     5000   10000
 4 -   0.0025W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        43 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    3%
Data Units Read:                    50,138,131 [25.6 TB]
Data Units Written:                 37,850,118 [19.3 TB]
Host Read Commands:                 669,059,704
Host Write Commands:                842,293,133
Controller Busy Time:               1,503
Power Cycles:                       2,955
Power On Hours:                     10,815
Unsafe Shutdowns:                   274
Media and Data Integrity Errors:    0
Error Information Log Entries:      4
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               43 Celsius
Temperature Sensor 2:               43 Celsius
Thermal Temp. 1 Transition Count:   10
Thermal Temp. 1 Total Time:         14

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged
$ sudo smartctl -a /dev/nvme0n1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.58-1-lts] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SKHynix_HFS001TEJ4X112N
Serial Number:                      4YD1N045710901O10
Firmware Version:                   51040C31
PCI Vendor/Subsystem ID:            0x1c5c
IEEE OUI Identifier:                0xace42e
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            ace42e 004a16cb0f
Local Time is:                      Sat Nov 29 20:14:51 2025 CST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     86 Celsius
Critical Comp. Temp. Threshold:     87 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +   4.5000W       -        -    0  0  0  0      100     100
 1 +   3.0000W       -        -    1  1  1  1      200     200
 2 +   0.6000W       -        -    2  2  2  2      400     400
 3 -   0.0150W       -        -    3  3  3  3     2000    2000
 4 -   0.0030W       -        -    4  4  4  4     5000   10000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    2%
Data Units Read:                    63,388,720 [32.4 TB]
Data Units Written:                 83,775,476 [42.8 TB]
Host Read Commands:                 2,198,068,774
Host Write Commands:                2,545,925,945
Controller Busy Time:               35,460
Power Cycles:                       2,509
Power On Hours:                     7,040
Unsafe Shutdowns:                   40
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               37 Celsius
Temperature Sensor 2:               33 Celsius

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged

你看致太这个盘


Unsafe Shutdowns:                   274

基本没跑了。我这个盘一开始随机重启或卡死,和arch论坛国外一网友讨论,他说不放启动盘就OK,我后来外接硬盘也正常了。9月份盘坏了,几分钟就卡死,稳定复现,找售后换了新的。

https://bbs.archlinux.org/viewtopic.php?id=294761 原帖地址

喔,274次不正常关机(之前出现过电脑突然无响应的情况,等了一会没恢复直接强制关机了)

关于TiPlus5000这块硬盘:之前这个盘装在了另一个电脑上(分了两个区,一个挂载到 /home),出现过一次比较严重的问题差点把我数据弄没了,如果真是硬盘问题那真的太不幸了 :grimacing:

回到 Btrfs error 错误

$ sudo btrfs filesystem usage /
Overall:
    Device size:                 468.00GiB
    Device allocated:            468.00GiB
    Device unallocated:            2.00MiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        369.53GiB
    Free (estimated):             96.59GiB      (min: 96.59GiB)
    Free (statfs, df):            96.59GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:444.97GiB, Used:348.38GiB (78.29%)
   /dev/nvme1n1p6        200.97GiB
   /dev/nvme0n1p1        244.00GiB

Metadata,DUP: Size:11.48GiB, Used:10.58GiB (92.11%)
   /dev/nvme1n1p6         10.96GiB
   /dev/nvme0n1p1         12.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB (0.24%)
   /dev/nvme1n1p6         64.00MiB

Unallocated:
   /dev/nvme1n1p6          1.00MiB
   /dev/nvme0n1p1          1.00MiB

Device unallocated大小只有2MB了,加上Metadata已经占用高达92%,查到了这两篇博客(依云大佬 :star_struck:btrfs 元数据满了怎么办btrfs 翻车记
大概率确切了是没有可分配空间给Metadata触发了错误,同时前不久我从Timeshift换成了Snapper,Snapper没有配置自动删除快照(我一看快照快100个了),并且发生的这两次错误都是在换成Snapper之后发生的 :upside_down_face:,正好是在整点创建快照时发生的错误

后面我清理了快照和一些垃圾,使用一段时间看看还会不会出现问题

$ sudo btrfs filesystem usage /
Overall:
    Device size:                 468.00GiB
    Device allocated:            461.00GiB
    Device unallocated:            7.00GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                        274.86GiB
    Free (estimated):            186.80GiB      (min: 183.30GiB)
    Free (statfs, df):           186.80GiB
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,single: Size:437.97GiB, Used:258.17GiB (58.95%)
   /dev/nvme0n1p6        198.97GiB
   /dev/nvme1n1p1        239.00GiB

Metadata,DUP: Size:11.48GiB, Used:8.34GiB (72.66%)
   /dev/nvme0n1p6         10.96GiB
   /dev/nvme1n1p1         12.00GiB

System,DUP: Size:32.00MiB, Used:80.00KiB (0.24%)
   /dev/nvme0n1p6         64.00MiB

Unallocated:
   /dev/nvme0n1p6          2.00GiB
   /dev/nvme1n1p1          5.00GiB

:ok_hand:

不安全关机数不一定是有问题, 有的盘睡眠休眠也会导致这个数据增加, 我一块RC20已经2000+了, 就是不关机只睡眠导致的