OpenSuse 11 verliert Windows Partitionen nach ca. 1h.

Goldfisch1980 · 5 Nov. 2008

Hallo zusammen,

ich bin mir nicht sicher ob das die richtige rubrik ist, aber ich poste es mal hier rein, weil es um eingebundene NTFS partionen geht, die nach ca. 1h aus linux verloren gehen...

Vorab: ich habe in meinem Rechner mehrere ATA Festplatten (3) und auf diesen läuft neben Linux auch ein WinXP problemlos. Unter linux ist es für mich einfacher die einzelnen Berechtigungen im Samba festzulegen und daher mach ich eigentlich Servernutzung nur mit dem Linux statt mit WinXP.
Auf allen Partitionen (ausgenommen swap und linux root (ext3)) ist NTFS eingerichtet. Der Zugriff erfolgt mittels dem neueren ntfs-3g (zum Lesen und Schreiben von NTFS). Dieses ntfs-3g habe ich vor wenigen Monaten mal eingerichtet. Weiss leider nicht welche Version da konfiguriert ist (wie bekommt man das denn raus?). Jedenfalls ist die Hardware ca. ein 1/2 Jahr alt nun und da habe ich SuSE 11 - 32bit neu aufgesetzt (hatte mit 64 Bit SuSE nur probs). Es ist der Kernel "Linux amdx64 2.6.25.18-0.2-pae #1 SMP 2008-10-21 16:30:26 +0200 i686 athlon i386 GNU/Linux" installiert. Also nicht der standard vanilla wie mir scheint. Das müsste irgendein gepatchter SuSE kernel sein!? :???:
Das System läuft in der ersten Stunde top. Es gibt keine Anzeichen von Fehlern oder sonst irgendwelcher Probleme! Unter Windows läuft alles Stundenlang problemlos! Also Hardware schliesse ich nun aus, da ich extra aufgrund diesen Problems auch alles neu gekauft habe (aber war eh an der zeit) :schockiert: Also für mich muss das ein Linux Softwareproblem sein da die Hardware neu ist und das Problem immer noch besteht und Windows keine derartigen Probleme macht.

So nun das Problem im Detail.
Ich habe meine NTFS Laufwerke im samba freigegeben und wenn ich z.b. meine darauf abgelegten mp3s streame oder auch nur den Rechner eine Zeit lang anlasse, dann kann es nach ca. 1h oder 1,5h... sein, dass ALLE NTFS laufwerke weg sind und dann alles hängt bis irgendwann gar nix mehr geht (besonders KDE). Sogar der shutdown -r now funktioniert dann auch nicht mehr... (es hängt einfach!)!
Die Linux Partition funktioniert prinzipiell aber noch wunderbar.

So nun wollt Ihr sicher einpaar logs... So bekommt ihr die...

Im /var/log/messages tut sich folgendes:

Code:

Nov  5 08:54:12 amdx64 kernel: CPU0 attaching NULL sched-domain.
Nov  5 08:54:12 amdx64 kernel: CPU1 attaching NULL sched-domain.
Nov  5 08:54:12 amdx64 kernel: CPU0 attaching sched-domain:
Nov  5 08:54:12 amdx64 kernel:  domain 0: span 00000000,00000000,00000000,00000003
Nov  5 08:54:12 amdx64 kernel:   groups: 00000000,00000000,00000000,00000001 00000000,00000000,00000000,00000002
Nov  5 08:54:12 amdx64 kernel: CPU1 attaching sched-domain:
Nov  5 08:54:12 amdx64 kernel:  domain 0: span 00000000,00000000,00000000,00000003
Nov  5 08:54:12 amdx64 kernel:   groups: 00000000,00000000,00000000,00000002 00000000,00000000,00000000,00000001
Nov  5 08:59:08 amdx64 su: (to root) lars on /dev/pts/2
Nov  5 09:01:25 amdx64 su: (to root) lars on /dev/pts/1
Nov  5 09:14:21 amdx64 gconfd (lars-4008): (Version 2.22.0) wird gestartet, Prozesskennung 4008, Benutzer »lars«
Nov  5 09:14:21 amdx64 gconfd (lars-4008): Die Adresse »xml:readonly:/etc/gconf/gconf.xml.mandatory« wurde an der Position 0 zu einer nur lesbaren Konfigurationsquelle aufgelöst
Nov  5 09:14:21 amdx64 gconfd (lars-4008): Die Adresse »xml:readwrite:/home/lars/.gconf« wurde an der Position 1 zu einer schreibbaren Konfigurationsquelle aufgelöst
Nov  5 09:14:21 amdx64 gconfd (lars-4008): Die Adresse »xml:readonly:/etc/gconf/gconf.xml.defaults« wurde an der Position 2 zu einer nur lesbaren Konfigurationsquelle aufgelöst
Nov  5 09:14:21 amdx64 gconfd (lars-4008): Die Adresse »xml:readonly:/etc/gconf/gconf.xml.vendor« wurde an der Position 3 zu einer nur lesbaren Konfigurationsquelle aufgelöst
Nov  5 09:14:21 amdx64 gconfd (lars-4008): Die Adresse »xml:readonly:/etc/gconf/gconf.xml.schemas« wurde an der Position 4 zu einer nur lesbaren Konfigurationsquelle aufgelöst
Nov  5 09:14:51 amdx64 gconfd (lars-4008): Der GConf-Server wird nicht verwendet und daher beendet.
Nov  5 09:14:51 amdx64 gconfd (lars-4008): Beenden
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 62 to 61
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 38 to 39
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 71 to 70
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdb, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 55 to 54
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 45 to 46
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdb, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 73 to 72
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdc, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253
Nov  5 09:53:41 amdx64 syslog-ng[1872]: STATS: dropped 0
Nov  5 10:07:19 amdx64 kernel: irq 20: nobody cared (try booting with the "irqpoll" option)
Nov  5 10:07:19 amdx64 kernel: Pid: 0, comm: swapper Tainted: G        N 2.6.25.18-0.2-pae #1
Nov  5 10:07:19 amdx64 kernel:  [<c01071d9>] dump_trace+0x63/0x227
Nov  5 10:07:19 amdx64 kernel:  [<c0107c8a>] show_trace+0x15/0x29
Nov  5 10:07:19 amdx64 kernel:  [<c02e2e65>] dump_stack+0x5b/0x65
Nov  5 10:07:19 amdx64 kernel:  [<c01559ac>] __report_bad_irq+0x2e/0x6f
Nov  5 10:07:19 amdx64 kernel:  [<c0155bac>] note_interrupt+0x1bf/0x217
Nov  5 10:07:19 amdx64 kernel:  [<c015615c>] handle_fasteoi_irq+0x8f/0xaf
Nov  5 10:07:19 amdx64 kernel:  [<c010830c>] do_IRQ+0x71/0x89
Nov  5 10:07:19 amdx64 kernel:  [<c0106a33>] common_interrupt+0x23/0x30
Nov  5 10:07:19 amdx64 kernel:  [<c0121a19>] finish_task_switch+0x2a/0xa6
Nov  5 10:07:19 amdx64 kernel:  [<c02e3761>] schedule+0x690/0x6ef
Nov  5 10:07:19 amdx64 kernel:  [<c0104a54>] cpu_idle+0xbb/0xc0
Nov  5 10:07:19 amdx64 kernel:  [<c02e1121>] start_secondary+0x153/0x158
Nov  5 10:07:19 amdx64 kernel:  =======================
Nov  5 10:07:19 amdx64 kernel: handlers:
Nov  5 10:07:19 amdx64 kernel: [<f905655c>] (ata_interrupt+0x0/0x1df [libata])
Nov  5 10:07:19 amdx64 kernel: [<f90edfd1>] (usb_hcd_irq+0x0/0x7d [usbcore])
Nov  5 10:07:19 amdx64 kernel: [<f92763ca>] (ohci_irq_handler+0x0/0x680 [ohci1394])
Nov  5 10:07:19 amdx64 kernel: Disabling IRQ #20
Nov  5 10:07:50 amdx64 kernel: ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Nov  5 10:07:50 amdx64 kernel: ata1.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0
Nov  5 10:07:50 amdx64 kernel:          cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Nov  5 10:07:50 amdx64 kernel:          res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
Nov  5 10:07:50 amdx64 kernel: ata1.01: status: { DRDY }
Nov  5 10:07:50 amdx64 kernel: ata1: soft resetting link
Nov  5 10:07:50 amdx64 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Nov  5 10:07:50 amdx64 kernel: ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Nov  5 10:07:50 amdx64 kernel:          cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Nov  5 10:07:50 amdx64 kernel:          res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Nov  5 10:07:50 amdx64 kernel: ata2.00: status: { DRDY }
Nov  5 10:07:50 amdx64 kernel: ata1: soft resetting link
Nov  5 10:07:50 amdx64 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Nov  5 10:07:50 amdx64 kernel: ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Nov  5 10:07:50 amdx64 kernel:          cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Nov  5 10:07:50 amdx64 kernel:          res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Nov  5 10:07:50 amdx64 kernel: ata2.00: status: { DRDY }
Nov  5 10:07:50 amdx64 kernel: ata2: soft resetting link
Nov  5 10:07:50 amdx64 kernel: ata2.00: configured for UDMA/33
Nov  5 10:07:50 amdx64 kernel: ata2: EH complete
Nov  5 10:08:20 amdx64 kernel: ata1.00: qc timeout (cmd 0x27)
Nov  5 10:08:20 amdx64 kernel: ata1.00: failed to read native max address (err_mask=0x4)
Nov  5 10:08:20 amdx64 kernel: ata1.00: revalidation failed (errno=-5)
Nov  5 10:08:20 amdx64 kernel: ata1: failed to recover some devices, retrying in 5 secs
Nov  5 10:08:20 amdx64 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Nov  5 10:08:20 amdx64 kernel: ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Nov  5 10:08:20 amdx64 kernel:          cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Nov  5 10:08:20 amdx64 kernel:          res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Nov  5 10:08:20 amdx64 kernel: ata2.00: status: { DRDY }
Nov  5 10:08:20 amdx64 kernel: ata2: soft resetting link
Nov  5 10:08:20 amdx64 kernel: ata2.00: configured for UDMA/33
Nov  5 10:08:20 amdx64 kernel: ata2: EH complete

Damit wären alle NTFS Partitionen weg. Ein umount -a und dann mount -a bringt gar nix. Da kommt dann beim mount -a folgendes:

Code:

Error reading bootsector: Eingabe-/Ausgabefehler
Failed to mount '/dev/sda1': Eingabe-/Ausgabefehler
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for the details.
Error reading bootsector: Eingabe-/Ausgabefehler
Failed to mount '/dev/sda5': Eingabe-/Ausgabefehler
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for the details.
Error reading bootsector: Eingabe-/Ausgabefehler
Failed to mount '/dev/sda6': Eingabe-/Ausgabefehler
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for the details.
Error reading bootsector: Eingabe-/Ausgabefehler
Failed to sync device /dev/sda7: Eingabe-/Ausgabefehler
Failed to mount '/dev/sda7': Eingabe-/Ausgabefehler
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for the details.
Error reading bootsector: Eingabe-/Ausgabefehler
Failed to mount '/dev/sda8': Eingabe-/Ausgabefehler
NTFS is either inconsistent, or you have hardware faults, or you have a
SoftRAID/FakeRAID hardware. In the first case run chkdsk /f on Windows
then reboot into Windows TWICE. The usage of the /f parameter is very
important! If you have SoftRAID/FakeRAID then first you must activate
it and mount a different device under the /dev/mapper/ directory, (e.g.
/dev/mapper/nvidia_eahaabcc1). Please see the 'dmraid' documentation
for the details.

Ich tippe darauf, dass der NTFS-3g ein Fehler hat. Aber sicher bin ich mir ned... Anfangs dachte ich es liegt an meinem selbstkompilierten Realtek Treiber für das Gigabit Ethernet. Aber die Netzwerkkarte tut auch nach dem Absturz der Partitionen (und davor logischerweise auch).
Das KDE ist auch mit abgestürzt in dem Moment. Die KDE Leiste läßt sich nicht mehr anklicken.. Aber vll. leigt das auch nur an einem Timeout auf den gewartet wird weil der die eingebundenen laufwerke dort irgendwo anzeigt habe ich entdeckt...

Und nun kommts: Nach einem Systemneustart passt wieder alles wunderbar!!!
Also der Kernel schein komplett ne macke nach dem Ereignis zu haben!

Wer kann mir da weiterhelfen..
Kann gerne noch andere Logs schicken, wenn Ihr mir damit helfen könnt?

vielen Dank,
Phil

spoensche · 5 Nov. 2008

Goldfisch1980 schrieb:

Code:

Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sda, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 62 to 61
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 38 to 39
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 71 to 70
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdb, SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 55 to 54
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 45 to 46
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdb, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 73 to 72
Nov  5 09:23:48 amdx64 smartd[3336]: Device: /dev/sdc, SMART Prefailure Attribute: 8 Seek_Time_Performance changed from 252 to 253

Die Festplatten sind aber nicht neu oder?

Die Meldung

Code:

Hardware_ECC_Recovered

lässt auf ein Festplattenproblem schliessen. Teste deine Platte mal mit den Tools vom Plattenhersteller.

Goldfisch1980 schrieb:

Code:

Nov  5 10:07:19 amdx64 kernel: irq 20: nobody cared (try booting with the "irqpoll" option)
Nov  5 10:07:19 amdx64 kernel: Pid: 0, comm: swapper Tainted: G        N 2.6.25.18-0.2-pae #1
Nov  5 10:07:19 amdx64 kernel:  [<c01071d9>] dump_trace+0x63/0x227
Nov  5 10:07:19 amdx64 kernel:  [<c0107c8a>] show_trace+0x15/0x29
Nov  5 10:07:19 amdx64 kernel:  [<c02e2e65>] dump_stack+0x5b/0x65
Nov  5 10:07:19 amdx64 kernel:  [<c01559ac>] __report_bad_irq+0x2e/0x6f
Nov  5 10:07:19 amdx64 kernel:  [<c0155bac>] note_interrupt+0x1bf/0x217
Nov  5 10:07:19 amdx64 kernel:  [<c015615c>] handle_fasteoi_irq+0x8f/0xaf

Versuch mal mit der Option

Code:

irqpoll

zu booten und teste, ob das Problem immer noch auftritt.

SUSEDJAlex · 5 Nov. 2008

@goldfisch1980:

Die Temperaturen deiner Platten sind viel zu hoch. Damit riskierst du den schnellen Hitzetod der Festplatte.
Maximum der Temperatur laut SMART liegt ca. 60 Grad. Bei dir sind sie weit über 65 Grad...
Außerdem sehe ich solche Schwankungen.

LG SUSEDJAlex

OpenSuse 11 verliert Windows Partitionen nach ca. 1h.

Goldfisch1980

spoensche

SUSEDJAlex