Dotaz: Vadný disk, smartctl a reklamace

26.5.2007 17:50 David
Vadný disk, smartctl a reklamace
mám 320GB disk Hitachi a nevím, zda ho mám reklamovat.

Pomocí příkazu

smartctl -t long /dev/hda

jsem si vytáhl SMART informace o disku. Výsledkem bylo, že harddisk prošel testem, viz

SMART overall-health self-assessment test result: PASSED.

Chápu, co určují hodnoty Reallocated_Event_Count, Current_Pending_Sector a Offline_Uncorrectable. Nevím, co udává hodnota UDMA_CRC_Error_Count a jaká je její závažnost.

Dále mi není jasné, co znamenají chyby (celkem 9), které se zapsaly do logu (viz dolní část výpisu), přitom disk prošel testem.

V manpage smartctl jsem našel význam zkratek, které jsou ve výpise uvedeny, ale úplně nechápu jejich význam a vliv na funkčnost harddisku.
ABRT:  Command ABoRTed
ICRC:  Interface Cyclic Redundancy Code (CRC) error
Bude harddisk fungovat dál v pořádku nebo je to důvod k reklamaci? Díky za odpovědi.

Výpis smartctl:
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device Model:     Hitachi HDT725032VLAT80
Serial Number:    VF1200R2CRL2SA
Firmware Version: V54OA42A
User Capacity:    320 072 933 376 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Sat May 26 15:47:57 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		 (5601) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (  94) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0007   117   117   024    Pre-fail  Always       -       336 (Average 303)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       204
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       1912
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       198
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       304
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       304
194 Temperature_Celsius     0x0002   136   136   000    Old_age   Always       -       44 (Lifetime Min/Max 21/48)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1425

SMART Error Log Version: 1
ATA Error Count: 9 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9 occurred at disk power-on lifetime: 1908 hours (79 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 3e 03 00 e0  Error: ICRC, ABRT at LBA = 0x0000033e = 830

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 3f 02 00 e0 00      00:02:28.700  READ DMA EXT
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  10 00 3f 00 00 00 e0 00      00:02:28.500  RECALIBRATE [OBS-4]
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT

Error 8 occurred at disk power-on lifetime: 1908 hours (79 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 3e 03 00 e0  Error: ICRC, ABRT at LBA = 0x0000033e = 830

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  10 00 3f 00 00 00 e0 00      00:02:28.500  RECALIBRATE [OBS-4]
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  25 00 00 3f 00 00 e0 00      00:02:28.500  READ DMA EXT

Error 7 occurred at disk power-on lifetime: 1908 hours (79 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 3e 03 00 e0  Error: ICRC, ABRT at LBA = 0x0000033e = 830

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  25 00 00 3f 00 00 e0 00      00:02:28.500  READ DMA EXT
  ea 00 00 00 00 00 e0 00      00:02:28.500  FLUSH CACHE EXIT
  25 00 20 3f 00 00 e0 00      00:02:28.300  READ DMA EXT

Error 6 occurred at disk power-on lifetime: 1908 hours (79 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 3e 03 00 e0  Error: ICRC, ABRT at LBA = 0x0000033e = 830

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 00 3f 02 00 e0 00      00:02:28.500  READ DMA EXT
  25 00 00 3f 00 00 e0 00      00:02:28.500  READ DMA EXT
  ea 00 00 00 00 00 e0 00      00:02:28.500  FLUSH CACHE EXIT
  25 00 20 3f 00 00 e0 00      00:02:28.300  READ DMA EXT
  10 00 3f 00 00 00 e0 00      00:02:28.300  RECALIBRATE [OBS-4]

Error 5 occurred at disk power-on lifetime: 1908 hours (79 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  84 51 00 5e 00 00 e0  Error: ICRC, ABRT at LBA = 0x0000005e = 94

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 20 3f 00 00 e0 00      00:02:28.300  READ DMA EXT
  25 00 20 3f 00 00 e0 00      00:02:28.300  READ DMA EXT
  ea 00 00 00 00 00 e0 00      00:02:26.300  FLUSH CACHE EXIT
  25 00 08 a8 ea 42 e0 00      00:02:26.300  READ DMA EXT
  ea 00 00 00 00 00 e0 00      00:02:26.300  FLUSH CACHE EXIT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1912         -
# 2  Extended offline    Completed without error       00%      1910         -

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

26.5.2007 20:47 ikarlos
Rozbalit Rozbalit vše Re: Vadný disk, smartctl a reklamace
Taky by mne zajimalo, jak to vlastne s tim smartctl je. Mam asi pet pocitacu s ruznymi disky, vetsinou seagate, a smatrctl u vsech vypisuje naprosto nesmyslne hodnoty vselikych chyb, v radech statisicu a porad se to meni, kupodivu i smerem dolu. Testy vzdy prosly PASSED. SMART povolen jak v BIOSu, tak nakonfigurovan jako daemon. Nicmene to nezabranilo tomu, aby jeden z disku chcipl za pochodu, pak parkrat castecne nabehl po dychani z ust do ust, aby ve finale zdechl uplne a vzal s sebou do pekla i vsechna data (nastesti vecer predtim zalohovana). No a slavny smart ani nehlesl. Nic neoznamil ani pred, ani behem agonie. Takze ja bych nesmysly ze smartu rozhodne jako duvod reklamace nebral.
26.5.2007 22:09 Ritchie | skóre: 27 | blog: Ritchie's | Berlin
Rozbalit Rozbalit vše Re: Vadný disk, smartctl a reklamace
Pokud SMART nic nehlásí, ještě to neznamená, že je všechno v pořádku. Pokud SMART hlásí chyby, je to jasná známka problémů a důvod k reklamaci. Prostě implikace SMART => problém platí jen jedním směrem.

Disky Seagate vyplňují položky Raw_Read_Error_Rate a Seek_Error_Rate nesmyslnými hodnotami, ostatní disky se chovají rozumně.

Nejsem si jist, ale nemohou být UDMA CRC chyby způsobeny nejen diskem, ale i kabeláží či řadičem? Budu rád, pokud někdo mé tvrzení popře či potvrdí.
27.5.2007 18:23 Olsen
Rozbalit Rozbalit vše Re: Vadný disk, smartctl a reklamace
Ono je to trošku jinak. Disky Seagate některé položky nevynulovávají po startu, proto ty ohromné hodnoty. Jiné disky ukazují tyto hodnoty jen od stratu počítače, takže budou samozřejmě malé. Holt se musí SMART z Seagatů interpretovat trošku jinak.

UDMA crc je podle mě chyba v přenosu, kterou odhalil cyclic redundancy check. Mám pocit, že takové věci se stávají. Já mám 0 u ultra ata crc errors, ale čert ví. Možná je na vině rušení.
27.5.2007 23:46 Ritchie | skóre: 27 | blog: Ritchie's | Berlin
Rozbalit Rozbalit vše Re: Vadný disk, smartctl a reklamace
Vaše tvrzení v prvním odstavci se nezakládá na pravdě, neboť u jiných disků jsou zmiňované hodnoty při bezchybném provozu stále na nule, zatímco disky Seagate ukazují hrozivé hodnoty, i když jsou v pořádku. Jak se mají zmiňované dvě syrové (raw) hodnoty interpretovat u disků Seagate?
27.5.2007 11:26 8an | skóre: 30
Rozbalit Rozbalit vše Re: Vadný disk, smartctl a reklamace
Tohle jsou chyby v komunikaci mezi diskem a řadičem, takže to bude vadný kabel, problém s řadičem (mě tohle dělal Silicon Image řadič, ale to bylo SATA), možná i přehřátí.
If you build an operating system that even an idiot can use, only idiots will use it.

