Age of Valor
login.php?sid=76ee4435a355ada4566c251ca5447ef6 profile.php?mode=register&sid=76ee4435a355ada4566c251ca5447ef6 faq.php?sid=76ee4435a355ada4566c251ca5447ef6 memberlist.php?sid=76ee4435a355ada4566c251ca5447ef6 search.php?sid=76ee4435a355ada4566c251ca5447ef6 index.php?sid=76ee4435a355ada4566c251ca5447ef6
Site Links: Home :: Features :: Connection Guide




Age of Valor Forum Index » News » [Oct 28 2017] - Unexpected Server Outage - [restored]
Post new topic  Reply to topic View previous topic :: View next topic 
[Oct 28 2017] - Unexpected Server Outage - [restored]
PostPosted: Sat Oct 28, 2017 3:47 pm Reply with quote
Red Squirrel
AoV Owner
Joined: 13 Dec 2006
Posts: 8122
Location: Ontario, Canada




I got called in to work but as it happened the power went out. For some reason the shard is now offline and I can't connect to the network.

Unfortunately I won't be able to get to it till after 7pm today.

I will monitor for the odd chance it comes back up as last time it was the ONT that was acting up, but I have a feeling a server took a power hit from the initial brownout despite there being a UPS. It's not dual conversion. I do have plans to upgrade to one eventually.


UPDATE 13:40ET: I went home real quick just to get an idea of damage and if it's something that just needs rebooting. At quick glance it looks like we lost the VM server that hosts pretty much everything. Also for some reason despite having a backup DNS on a physical server, DNS is also not working.

Good news is the file server is ok. There may still be OS level corruption from the VMs having been improperly shut down but worse case scenario I need to re-image each one. There should be no actual loss of data. If yes there are backups.

Shard will remain down for the rest of day till I get home and can figure out what is going on in more detail.


UPDATE 20:29ET: I am now home and investigating this issue. I am hoping it's nothing bad as this affects all my own stuff too not just the shard. Pretty much dead in the water here.


UPDATE 21:01ET: This is going to require more coffee and possibly an all nighter. I don't even have heat because I can't access the environmental control server. This is very bad, I'm still trying to wrap my head around this outage. I just hope there's no corruption once I can get to the point where I can see the VMs. Right now the storage is not linking properly and there's all sorts of weird DNS issues. I setup a backup DNS server a while back but aparantly everything is still trying to connect to the main one which is a VM. I think I should just make the physical box the primary, though part of the issue is even that box is acting weird.


UPDATE 22:17ET: I may have some progress. Was able to get the storage subsystem back online. I am powering up my VMs one by one to make sure it's stable then will get to the shard VM shortly.


UPDATE 22:33ET: Looks like the database is corrupted. I will work on restoring it. If this is successful and that there are no other issues, the shard should be back up very shortly. TC1 was corrupted too, oddly dev was ok. I fear I have a lot more data corruption to deal with though but I'll probably find it over the next few weeks, months even years as I need it.


FINAL UPDATE 23:36ET: Restored a backup from Sat Oct 28 06:00:08 EDT 2017 and tested server ok.

I still have a lot of other stuff to restore such as TC1 and my own personal stuff. I lost my VPN server completely, the VM is destroyed, and other stuff like that. But as far as shard goes, it is back online and should be stable. I need to start looking into upgrading my UPS to 48v dual conversion, I was putting that off, but it's the second time something like this happens so I will have to get on it. A simple power bump should not be this disastrous. I'm also looking into simply adding a large capacitor bank that hooks into each server's PSU but that is more invasive and complex as I need to implement inrush current limiting etc.
_________________

my blog

Honk if you love Jesus, text if you want to meet Him!
View user's profile Send private message Visit poster's website MSN Messenger
[Oct 28 2017] - Unexpected Server Outage - [restored]
Age of Valor Forum Index » News
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT  
Page 1 of 1  

  
  
 Post new topic  Reply to topic  
Shout Box


Powered by phpBB © 2001-2004 phpBB Group
Designed for Trushkin.net | Themes Database

This website and forum best viewed in a standards compliant browser such as Firefox or Opera.
Internet explorer is not recommended.