Friday, October 14, 2016

Unclean shutdown detected. for mongod process in Arbiter server

Following messages were received in “/var/log/mongo/mongod.log” of primary server  and other servers of replicatset

Sat Jul 30 02:48:08.631 [rsHealthPoll] couldn't connect to xxx.local:27017: couldn't connect to server xxx.local:27017

Same message was appearing on other arbiter server

when checked on arbiter server no mongod process was running.

[root@xxx ~]# ps -ef| grep mongo
root     10581 10533  0 09:43 pts/0    00:00:00 grep mongo


Starting Mongod process:

When starting mongod process with command “service mongod start”, it encountered following error:

# service mongod start
Starting mongod: about to fork child process, waiting until server is ready for connections.
forked process: 10345
all output going to: /var/log/mongodb/mongod.log
ERROR: child process failed, exited with error number 100


When checking /var/log/mongodb/mongod.log, following was logged.

Fri Oct 14 09:29:41.540 [initandlisten] MongoDB starting : pid=10345 port=27017 dbpath=/data/arb 64-bit host=controller-02
Fri Oct 14 09:29:41.540 [initandlisten] db version v2.4.14
Fri Oct 14 09:29:41.540 [initandlisten] git version: 05bebf9ab15511a71bfbded684bb226014c0a553
Fri Oct 14 09:29:41.540 [initandlisten] build info: Linux ip-10-154-253-119 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Fri Oct 14 09:29:41.540 [initandlisten] allocator: tcmalloc
Fri Oct 14 09:29:41.540 [initandlisten] options: { auth: "true", config: "/etc/mongod.conf", dbpath: "/data/arb", fork: "true", keyFile: "/opt/mongo/key/keyfile1", logappend: "true", logpath: "/var/log/mongodb/mongod.log", nojournal: "true", pidfilepath: "/var/run/mongodb/mongod.pid", port: 27017, replSet: "sitMongoSet", smallfiles: "true" }
**************
Unclean shutdown detected.
Please visit http://dochub.mongodb.org/core/repair for recovery instructions.
*************
Fri Oct 14 09:29:41.695 [initandlisten] exception in initAndListen: 12596 old lock file, terminating
Fri Oct 14 09:29:41.695 dbexit:
Fri Oct 14 09:29:41.695 [initandlisten] shutdown: going to close listening sockets...
Fri Oct 14 09:29:41.695 [initandlisten] shutdown: going to flush diaglog...
Fri Oct 14 09:29:41.695 [initandlisten] shutdown: going to close sockets...
Fri Oct 14 09:29:41.695 [initandlisten] shutdown: waiting for fs preallocator...
Fri Oct 14 09:29:41.695 [initandlisten] shutdown: closing all files...
Fri Oct 14 09:29:41.696 [initandlisten] closeAllFiles() finished
Fri Oct 14 09:29:41.696 dbexit: really exiting now


It appears mongod process was not cleanly shutdown or server might have rebooted leaving mongodb in unclean state.

Journal was disabled on this Arbiter, so I ran following command to recover the state.
mongod --dbpath /data/arb –repair


After running above command, I again received following errors:

Fri Oct 14 10:09:59.059 [initandlisten] MongoDB starting : pid=11079 port=27017 dbpath=/data/arb 64-bit host=controller-02
Fri Oct 14 10:09:59.060 [initandlisten] db version v2.4.14
Fri Oct 14 10:09:59.060 [initandlisten] git version: 05bebf9ab15511a71bfbded684bb226014c0a553
Fri Oct 14 10:09:59.060 [initandlisten] build info: Linux ip-10-154-253-119 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Fri Oct 14 10:09:59.060 [initandlisten] allocator: tcmalloc
Fri Oct 14 10:09:59.060 [initandlisten] options: { auth: "true", config: "/etc/mongod.conf", dbpath: "/data/arb", fork: "true", keyFile: "/opt/mongo/key/keyfile1", logappend: "true", logpath: "/var/log/mongodb/mongod.log", nojournal: "true", pidfilepath: "/var/lib/mongo/mongod.pid", port: 27017, replSet: "sitMongoSet", smallfiles: "true" }
Fri Oct 14 10:09:59.108 [initandlisten] couldn't open /data/arb/local.ns errno:13 Permission denied
Fri Oct 14 10:09:59.108 [initandlisten] error couldn't open file /data/arb/local.ns terminating
Fri Oct 14 10:09:59.108 dbexit:
Fri Oct 14 10:09:59.108 [initandlisten] shutdown: going to close listening sockets...
Fri Oct 14 10:09:59.108 [initandlisten] shutdown: going to flush diaglog...
Fri Oct 14 10:09:59.108 [initandlisten] shutdown: going to close sockets...
Fri Oct 14 10:09:59.108 [initandlisten] shutdown: waiting for fs preallocator...
Fri Oct 14 10:09:59.108 [initandlisten] shutdown: closing all files...
Fri Oct 14 10:09:59.108 [initandlisten] closeAllFiles() finished
Fri Oct 14 10:09:59.109 [initandlisten] shutdown: removing fs lock...
Fri Oct 14 10:09:59.109 dbexit: really exiting now

It was identified “/data/arb/local.ns” permission was not correct. So I ran following command to correct it (root was the owner)
chown mongod:mongod /data/arb/local.ns

Once above was fixed, mongod process started running fine.

[rsHealthPoll] couldn't connect to xxxxx.local:27017: couldn't connect to server xxxxxx.local:27017

I received following error on mongod.log file of every server of replicaset

[rsHealthPoll] couldn't connect to xxxxx.local:27017: couldn't connect to server xxxxxx.local:27017

When I checked, I discovered one of the arbiter server was not running i.e. mongod process was not running on arbiter server.

Solution: Start the mongod process with either of following command

service mongod start

or


/etc/init.d/mongod start

Once mongod process was started, every server in the replicaset were connected successfully to this arbiter server.