Di recente ho aggiornato il mio server principale da Ubuntu 10.04 a 12.04.1. Esegue il kernel linux-image-server, arco x86_64.
Non c'è nulla di particolarmente insolito in esecuzione, credo: un demone diluvio, apache2, firewall iptables con mascheramento IP, server DHCP, server DNS associato che ha i file di zona aggiornati automaticamente con i nomi host I client DHCP si identificano con, sshd, server nfs, una manciata di altre cose. Questa macchina è il mio router: si trova tra Internet e la rete locale.
Dall'aggiornamento si è verificato un errore intermittente. Andrà bene per un po 'dopo l'avvio e poi all'improvviso perderemo le nostre connessioni di rete sul wifi. Se inserisco un cavo di rete, non riesco a ottenere un indirizzo IP dal server DHCP. Se mi imposto un indirizzo IP statico, posso continuare ad accedere a Internet bene. Questo fa sembrare che sia il server DHCP a fallire (anzi, corro dhclient -v eth0
e nulla risponde alle scansioni dhcpdiscover), notato quando i client provano a rinnovare i loro contratti di locazione IP. Ma cablato con un IP statico riesco ancora ad accedere a Internet, quindi iptables continua a funzionare bene.
Quindi provo ad accedere alla macchina tramite SSH, ma sembra bloccarsi. Se faccio ssh verbose vedo che stabilisce una connessione al server, quindi fallisce un po 'più in basso nella linea - difficile vedere esattamente dove.
Prendo atto che se provo a catturare una pagina Web dal suo server HTTP ottengo la pagina richiesta ma tutte le richieste extra fatte (per immagini, fogli di stile, javascript) non vengono servite. Posso comunque ottenere questi file se li richiedo direttamente, ad esempio da curl.
Questo suggerisce che le cose vanno in discesa ogni volta che qualcosa tenta di biforcarsi?
Ho trascinato un monitor e una tastiera sul server (di solito è senza testa) e ho dato un'occhiata: vedo tracce dello stack.
Passo a un nuovo terminale virtuale e provo ad accedere. Ricevo una traccia dello stack (errore di protezione generale) dopo aver inserito la mia password. Ecco qui:
Jan 6 20:19:54 localhost kernel: [ 1475.178245] general protection fault: 0000 [#12] SMP
Jan 6 20:19:54 localhost kernel: [ 1475.178292] CPU 1
Jan 6 20:19:54 localhost kernel: [ 1475.178309] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:19:54 localhost kernel: [ 1475.178911]
Jan 6 20:19:54 localhost kernel: [ 1475.178927] Pid: 1305, comm: login Tainted: G B D 3.2.0-35-generic #55-Ubuntu Gigabyte Technology Co., Ltd. GA-MA785GM-US2H/GA-MA785GM-US2H
Jan 6 20:19:54 localhost kernel: [ 1475.179028] RIP: 0010:[<ffffffff8116589a>] [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:19:54 localhost kernel: [ 1475.179096] RSP: 0018:ffff88006b251d78 EFLAGS: 00010206
Jan 6 20:19:54 localhost kernel: [ 1475.179135] RAX: 0000000000000000 RBX: 00007f062bb91000 RCX: 000000000005b2ed
Jan 6 20:19:54 localhost kernel: [ 1475.179186] RDX: 000000000005b2ec RSI: 0000000000016da0 RDI: ffff88006d408a00
Jan 6 20:19:54 localhost kernel: [ 1475.179236] RBP: ffff88006b251dc8 R08: ffff88006fa96da0 R09: 0000000000000001
Jan 6 20:19:54 localhost kernel: [ 1475.179287] R10: 00000000000000d1 R11: ffff88006b23a8f0 R12: ffff88006d408a00
Jan 6 20:19:54 localhost kernel: [ 1475.179336] R13: 2665c4979a04b7b8 R14: ffffffff811447c5 R15: 00000000000080d0
Jan 6 20:19:54 localhost kernel: [ 1475.179387] FS: 00007f062bb81700(0000) GS:ffff88006fa80000(0000) knlGS:0000000000000000
Jan 6 20:19:54 localhost kernel: [ 1475.179445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 6 20:19:54 localhost kernel: [ 1475.179486] CR2: 00007f9b4d79da00 CR3: 0000000059a34000 CR4: 00000000000006e0
Jan 6 20:19:54 localhost kernel: [ 1475.179536] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 6 20:19:54 localhost kernel: [ 1475.179586] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 6 20:19:54 localhost kernel: [ 1475.179637] Process login (pid: 1305, threadinfo ffff88006b250000, task ffff880036058000)
Jan 6 20:19:54 localhost kernel: [ 1475.179695] Stack:
Jan 6 20:19:54 localhost kernel: [ 1475.179711] ffff880036058000 0000000000000041 0000000000000001 ffffffff81188cec
Jan 6 20:19:54 localhost kernel: [ 1475.179777] 0000000000000282 00007f062bb91000 ffff88006822ce00 0000000000000001
Jan 6 20:19:54 localhost kernel: [ 1475.179841] 0000000000001000 0000000000000000 ffff88006b251e88 ffffffff811447c5
Jan 6 20:19:54 localhost kernel: [ 1475.179905] Call Trace:
Jan 6 20:19:54 localhost kernel: [ 1475.179928] [<ffffffff81188cec>] ? path_openat+0xfc/0x3f0
Jan 6 20:19:54 localhost kernel: [ 1475.179971] [<ffffffff811447c5>] mmap_region+0x2a5/0x4f0
Jan 6 20:19:54 localhost kernel: [ 1475.180012] [<ffffffff81144d58>] do_mmap_pgoff+0x348/0x360
Jan 6 20:19:54 localhost kernel: [ 1475.180054] [<ffffffff81144e36>] sys_mmap_pgoff+0xc6/0x230
Jan 6 20:19:54 localhost kernel: [ 1475.180098] [<ffffffff81018b12>] sys_mmap+0x22/0x30
Jan 6 20:19:54 localhost kernel: [ 1475.180136] [<ffffffff816655c2>] system_call_fastpath+0x16/0x1b
Jan 6 20:19:54 localhost kernel: [ 1475.180180] Code: 00 4d 8b 04 24 65 4c 03 04 25 50 da 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 d8 00 00 00 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0e 0f 94 c0 84 c0 74 c2 4d
Jan 6 20:19:54 localhost kernel: [ 1475.180503] RIP [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:19:54 localhost kernel: [ 1475.180552] RSP <ffff88006b251d78>
Jan 6 20:19:54 localhost kernel: [ 1475.180603] ---[ end trace 766ef1ef52f774b9 ]---
Se guardo abbastanza a lungo, vedo errori di protezione più generali. Li ho visti per login
, apache2
, deluge-web
, head
, powerbtn.sh
finora.
Devo ripristinare la macchina per riportarla a uno stato di funzionamento (ricevo anche un errore di protezione generale per powerbtn.sh
quando premo il pulsante di accensione), ma non passa molto tempo prima che ritorni in questo modo.
Non ho ancora capito come riprodurlo su richiesta - sembra accadere a caso.
Nel caso sia utile, ho esaminato kern.log e ho trovato il primo errore. C'è una tonnellata di loro tutti in fila a partire zsh
, poi deluged
, apache2
, cron
, head
, console-kit-dae
, irqbalance
, nmbd
... Ecco l' zsh
uno e l'errore di cattivo stato pagina che viene subito dopo:
Jan 6 20:13:35 localhost kernel: [ 1096.184250] general protection fault: 0000 [#1] SMP
Jan 6 20:13:35 localhost kernel: [ 1096.186339] CPU 1
Jan 6 20:13:35 localhost kernel: [ 1096.186355] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:13:35 localhost kernel: [ 1096.188008]
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Pid: 2564, comm: zsh Not tainted 3.2.0-35-generic #55-Ubuntu Gigabyte Technology Co., Ltd. GA-MA785GM-US2H/GA-MA785GM-US2H
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RIP: 0010:[<ffffffff8116589a>] [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RSP: 0018:ffff880059877d78 EFLAGS: 00010206
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RAX: 0000000000000000 RBX: 00007f202c59d000 RCX: 000000000005b2ed
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RDX: 000000000005b2ec RSI: 0000000000016da0 RDI: ffff88006d408a00
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RBP: ffff880059877dc8 R08: ffff88006fa96da0 R09: 0000000000000001
Jan 6 20:13:35 localhost kernel: [ 1096.188008] R10: 0000000000100073 R11: ffff880059dbb2c0 R12: ffff88006d408a00
Jan 6 20:13:35 localhost kernel: [ 1096.188008] R13: 2665c4979a04b7b8 R14: ffffffff811447c5 R15: 00000000000080d0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] FS: 00007f202c5ac700(0000) GS:ffff88006fa80000(0000) knlGS:0000000000000000
Jan 6 20:13:35 localhost kernel: [ 1096.188008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 6 20:13:35 localhost kernel: [ 1096.188008] CR2: 00000000025991f0 CR3: 0000000059dbc000 CR4: 00000000000006e0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 6 20:13:35 localhost kernel: [ 1096.188008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Process zsh (pid: 2564, threadinfo ffff880059876000, task ffff88006b6b5c00)
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Stack:
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000000001 0000000000001000 0000000000000001 ffffffff8129e2e0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000000001 00007f202c59d000 ffff88006822f480 0000000000000001
Jan 6 20:13:35 localhost kernel: [ 1096.188008] 0000000000001000 0000000000000000 ffff880059877e88 ffffffff811447c5
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Call Trace:
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff8129e2e0>] ? cap_vm_enough_memory+0x50/0x60
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff811447c5>] mmap_region+0x2a5/0x4f0
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81144d58>] do_mmap_pgoff+0x348/0x360
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81144eb1>] sys_mmap_pgoff+0x141/0x230
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff81018b12>] sys_mmap+0x22/0x30
Jan 6 20:13:35 localhost kernel: [ 1096.188008] [<ffffffff816655c2>] system_call_fastpath+0x16/0x1b
Jan 6 20:13:35 localhost kernel: [ 1096.188008] Code: 00 4d 8b 04 24 65 4c 03 04 25 50 da 00 00 49 8b 50 08 4d 8b 28 4d 85 ed 0f 84 d8 00 00 00 49 63 44 24 20 49 8b 34 24 48 8d 4a 01 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0e 0f 94 c0 84 c0 74 c2 4d
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RIP [<ffffffff8116589a>] kmem_cache_alloc+0x5a/0x140
Jan 6 20:13:35 localhost kernel: [ 1096.188008] RSP <ffff880059877d78>
Jan 6 20:13:35 localhost kernel: [ 1096.274513] ---[ end trace 766ef1ef52f774ae ]---
Jan 6 20:13:37 localhost kernel: [ 1097.836149] BUG: Bad page state in process swapper/0 pfn:59a33
Jan 6 20:13:37 localhost kernel: [ 1097.838885] page:ffffea0001668cc0 count:0 mapcount:-1 mapping: (null) index:0xffff880059a33160
Jan 6 20:13:37 localhost kernel: [ 1097.841673] page flags: 0x100000000000000()
Jan 6 20:13:37 localhost kernel: [ 1097.844440] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc dm_crypt ppdev ipt_REJECT ipt_LOG ipt_MASQUERADE xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables joydev sp5100_tco edac_core i2c_piix4 serio_raw k8temp edac_mce_amd snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd soundcore parport_pc snd_page_alloc mac_hid shpchp lp parport radeon 8139too ttm drm_kms_helper drm pata_atiixp i2c_algo_bit usbhid hid wmi r8169
Jan 6 20:13:37 localhost kernel: [ 1097.856881] Pid: 0, comm: swapper/0 Tainted: G D 3.2.0-35-generic #55-Ubuntu
Jan 6 20:13:37 localhost kernel: [ 1097.860020] Call Trace:
Jan 6 20:13:37 localhost kernel: [ 1097.863063] <IRQ> [<ffffffff8111fe8f>] bad_page.part.61+0x9f/0xf0
Jan 6 20:13:37 localhost kernel: [ 1097.866119] [<ffffffff8111fef8>] bad_page+0x18/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.869158] [<ffffffff8112098e>] free_pages_prepare+0x10e/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.872178] [<ffffffff81120af9>] free_hot_cold_page+0x49/0x1a0
Jan 6 20:13:37 localhost kernel: [ 1097.875183] [<ffffffff81120c7d>] __free_pages+0x2d/0x40
Jan 6 20:13:37 localhost kernel: [ 1097.878163] [<ffffffff8159a8fb>] tcp_v4_destroy_sock+0x25b/0x2c0
Jan 6 20:13:37 localhost kernel: [ 1097.881105] [<ffffffff81582695>] inet_csk_destroy_sock+0x55/0x140
Jan 6 20:13:37 localhost kernel: [ 1097.883970] [<ffffffff815849b0>] tcp_done+0x50/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.886853] [<ffffffff81591d92>] tcp_rcv_state_process+0x422/0x5f0
Jan 6 20:13:37 localhost kernel: [ 1097.889724] [<ffffffff8159a597>] tcp_v4_do_rcv+0xc7/0x1d0
Jan 6 20:13:37 localhost kernel: [ 1097.892513] [<ffffffff8159c1f1>] tcp_v4_rcv+0x581/0x820
Jan 6 20:13:37 localhost kernel: [ 1097.895301] [<ffffffff81577b60>] ? ip_rcv_finish+0x370/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.898110] [<ffffffff81577b60>] ? ip_rcv_finish+0x370/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.900915] [<ffffffff81577c3d>] ip_local_deliver_finish+0xdd/0x280
Jan 6 20:13:37 localhost kernel: [ 1097.903716] [<ffffffff81577fa8>] ip_local_deliver+0x88/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.906502] [<ffffffff815778fd>] ip_rcv_finish+0x10d/0x370
Jan 6 20:13:37 localhost kernel: [ 1097.909279] [<ffffffff815781e5>] ip_rcv+0x235/0x300
Jan 6 20:13:37 localhost kernel: [ 1097.912067] [<ffffffff81613dc7>] ? packet_rcv_spkt+0x47/0x190
Jan 6 20:13:37 localhost kernel: [ 1097.914831] [<ffffffff81543446>] __netif_receive_skb+0x4d6/0x550
Jan 6 20:13:37 localhost kernel: [ 1097.917624] [<ffffffff81544230>] netif_receive_skb+0x80/0x90
Jan 6 20:13:37 localhost kernel: [ 1097.920415] [<ffffffff81536474>] ? __netdev_alloc_skb+0x24/0x50
Jan 6 20:13:37 localhost kernel: [ 1097.923124] [<ffffffffa00d6e90>] rtl8139_rx+0x150/0x2b0 [8139too]
Jan 6 20:13:37 localhost kernel: [ 1097.925754] [<ffffffffa00d704a>] rtl8139_poll+0x5a/0xd0 [8139too]
Jan 6 20:13:37 localhost kernel: [ 1097.928274] [<ffffffff81544bd4>] net_rx_action+0x134/0x290
Jan 6 20:13:37 localhost kernel: [ 1097.930698] [<ffffffff8103df8b>] ? native_safe_halt+0xb/0x10
Jan 6 20:13:37 localhost kernel: [ 1097.933115] [<ffffffff8106f6e8>] __do_softirq+0xa8/0x210
Jan 6 20:13:37 localhost kernel: [ 1097.935495] [<ffffffff810967f5>] ? do_timer+0x25/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.937836] [<ffffffff81035dc2>] ? ack_apic_level+0x72/0x190
Jan 6 20:13:37 localhost kernel: [ 1097.940163] [<ffffffff8166782c>] call_softirq+0x1c/0x30
Jan 6 20:13:37 localhost kernel: [ 1097.942464] [<ffffffff81016305>] do_softirq+0x65/0xa0
Jan 6 20:13:37 localhost kernel: [ 1097.944778] [<ffffffff8106face>] irq_exit+0x8e/0xb0
Jan 6 20:13:37 localhost kernel: [ 1097.947068] [<ffffffff816680e3>] do_IRQ+0x63/0xe0
Jan 6 20:13:37 localhost kernel: [ 1097.949327] [<ffffffff8165d46e>] common_interrupt+0x6e/0x6e
Jan 6 20:13:37 localhost kernel: [ 1097.951597] <EOI> [<ffffffff8103df8b>] ? native_safe_halt+0xb/0x10
Jan 6 20:13:37 localhost kernel: [ 1097.953891] [<ffffffff810900a8>] ? hrtimer_start+0x18/0x20
Jan 6 20:13:37 localhost kernel: [ 1097.956171] [<ffffffff8101c983>] default_idle+0x53/0x1d0
Jan 6 20:13:37 localhost kernel: [ 1097.958426] [<ffffffff8101cb5d>] amd_e400_idle+0x5d/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.960704] [<ffffffff81013236>] cpu_idle+0xd6/0x120
Jan 6 20:13:37 localhost kernel: [ 1097.962970] [<ffffffff816235ee>] rest_init+0x72/0x74
Jan 6 20:13:37 localhost kernel: [ 1097.965195] [<ffffffff81cfbc03>] start_kernel+0x3b0/0x3bd
Jan 6 20:13:37 localhost kernel: [ 1097.967421] [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
Jan 6 20:13:37 localhost kernel: [ 1097.969660] [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
Jan 6 20:13:37 localhost kernel: [ 1097.971888] [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
Cosa sta succedendo qui? Cosa posso fare?
memtest
. Ma poiché le tue tracce appaiono molto presto, dubito che sia un ricordo. Quale hardware ha il tuo server? Hai fatto qualche modifica / overclocking?