Portál AbcLinuxu, 4. května 2025 23:12
mce: [Hardware Error]: Machine check events logged EDAC sbridge MC0: HANDLING MCE MEMORY ERROR EDAC sbridge MC0: CPU 12: Machine Check Event: 0 Bank 9: 8c000050000800c0 EDAC sbridge MC0: TSC 0 EDAC sbridge MC0: ADDR 1485247000 EDAC sbridge MC0: MISC 90000010001208c EDAC sbridge MC0: PROCESSOR 0:306f2 TIME 1497174376 SOCKET 1 APIC 20 EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#1_Ha#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x1485247 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c0 socket:1 ha:0 channel_mask:2 rank:0)
mce: [Hardware Error]: Machine check events logged EDAC sbridge MC0: HANDLING MCE MEMORY ERROR EDAC sbridge MC0: CPU 12: Machine Check Event: 0 Bank 7: 8c00004000010090 EDAC sbridge MC0: TSC 0 EDAC sbridge MC0: ADDR 1485247540 EDAC sbridge MC0: MISC 1527afa86 EDAC sbridge MC0: PROCESSOR 0:306f2 TIME 1495475795 SOCKET 1 APIC 20 EDAC MC0: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x1485247 offset:0x540 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:1 ha:0 channel_mask:1 rank:0)
# edac-util -v mc0: 0 Uncorrected Errors with no DIMM info mc0: 0 Corrected Errors with no DIMM info mc0: csrow0: 0 Uncorrected Errors mc0: csrow0: CPU_SrcID#1_Ha#0_Chan#0_DIMM#0: 26 Corrected Errors mc0: csrow0: CPU_SrcID#1_Ha#0_Chan#1_DIMM#0: 66 Corrected Errors mc0: csrow0: CPU_SrcID#1_Ha#1_Chan#0_DIMM#0: 0 Corrected Errors mc0: csrow0: CPU_SrcID#1_Ha#1_Chan#1_DIMM#0: 0 Corrected Errors mc1: 0 Uncorrected Errors with no DIMM info mc1: 0 Corrected Errors with no DIMM info mc1: csrow0: 0 Uncorrected Errors mc1: csrow0: CPU_SrcID#0_Ha#0_Chan#0_DIMM#0: 0 Corrected Errors mc1: csrow0: CPU_SrcID#0_Ha#0_Chan#1_DIMM#0: 0 Corrected Errors mc1: csrow0: CPU_SrcID#0_Ha#1_Chan#0_DIMM#0: 0 Corrected Errors mc1: csrow0: CPU_SrcID#0_Ha#1_Chan#1_DIMM#0: 0 Corrected ErrorsPřečetl jsem si adresu, na kterou to nadává (0x1485247XXX), a v dmidecode našel následující:
Handle 0x006F, DMI type 16, 23 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: Multi-bit ECC Maximum Capacity: 128 GB Error Information Handle: Not Provided Number Of Devices: 2 Handle 0x0070, DMI type 19, 31 bytes Memory Array Mapped Address Starting Address: 0x01000000000 Ending Address: 0x017FFFFFFFF Range Size: 32 GB Physical Array Handle: 0x006F Partition Width: 2 Handle 0x0071, DMI type 17, 34 bytes Memory Device Array Handle: 0x006F Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 16384 MB Form Factor: DIMM Set: None Locator: DIMM_E1 Bank Locator: NODE 3 Type: Other Type Detail: Synchronous Speed: 2133 MHz Manufacturer: Micron Serial Number: 112718E3 Asset Tag: DIMM_E1_AssetTag Part Number: 36ASF2G72PZ-2G1A2 Rank: 2 Configured Clock Speed: 2133 MHz Handle 0x0072, DMI type 20, 35 bytes Memory Device Mapped Address Starting Address: 0x01000000000 Ending Address: 0x013FFFFFFFF Range Size: 16 GB Physical Device Handle: 0x0071 Memory Array Mapped Address Handle: 0x0070 Partition Row Position: 1 Handle 0x0073, DMI type 17, 34 bytes Memory Device Array Handle: 0x006F Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 16384 MB Form Factor: DIMM Set: None Locator: DIMM_F1 Bank Locator: NODE 3 Type: Other Type Detail: Synchronous Speed: 2133 MHz Manufacturer: Micron Serial Number: 11271993 Asset Tag: DIMM_F1_AssetTag Part Number: 36ASF2G72PZ-2G1A2 Rank: 2 Configured Clock Speed: 2133 MHz Handle 0x0074, DMI type 20, 35 bytes Memory Device Mapped Address Starting Address: 0x01400000000 Ending Address: 0x017FFFFFFFF Range Size: 16 GB Physical Device Handle: 0x0073 Memory Array Mapped Address Handle: 0x0070 Partition Row Position: 1 Handle 0x0075, DMI type 16, 23 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: Multi-bit ECC Maximum Capacity: 128 GB Error Information Handle: Not Provided Number Of Devices: 2Sloty na desce jsou fyzicky popsané DIMM_A1 až H1 a adresa, ve které došlo k chybě, spadá do DIMM_F1. Jenže mi není jasné, jak funguje ten dual channel - naivně jsem si myslel, že paměť interleavuje po malých blocích (šířka sběrnice nebo cacheline), jenže v tom dmidecode je vidět, že paměťové moduly jdou po 16 GiB. Není možné, že je chyba třeba i v modulu DIMM_E1?
Tiskni
Sdílej:
ISSN 1214-1267, (c) 1999-2007 Stickfish s.r.o.