HA issue that concerns me
HA issue that concerns me
I believe there is a big problem in the replication in HA for Xenserver. I was doing extensive testing with 2 istorage nodes (Node A & Node B) and I have sent the task manager graph from Node B after it came back up after having been shutdown by me as a test.
I had started my test with both nodes being up. I spun up 3 VM's (2 linux and 1 windows) and had started installs on all 3 OS's. I then shutdown Node 2 (stopped iStorage service) for a few minutes and all traffic was going to Node A and working fine. I then restarted Node B. I saw the incoming replication traffic (in yellow) coming in from Node A which I expected, but I didn't expect Node B data to start receiving traffic immediately (in yellow on 10.10.31.12) and then replicating it back to Node A before Node B had even caught up! All 3 of my installs were corrupted and I had to trash them. I did restart them with both A & B nodes up and everything worked fine. The problem I believe is that the incoming data should NOT have started coming into the data nic 10.10.31.12 until AFTER Node B had gotten caught up with replication data from Node A. On the graph the top picture is the dedicated replication link and the bottom picture is the data connection. Yellow is incoming data and red is data being sent to Node A. Here is how to replicate test.
1) have both A & B nodes up and connected to xenserver
2) start a few VM installs
3) down one of your istorage nodes for at least 5 minutes, keep installing os's on vm's
4) bring back up the downed istorage node
5) attempt to finish your installs and they will be corrupted
Thanks!
I had started my test with both nodes being up. I spun up 3 VM's (2 linux and 1 windows) and had started installs on all 3 OS's. I then shutdown Node 2 (stopped iStorage service) for a few minutes and all traffic was going to Node A and working fine. I then restarted Node B. I saw the incoming replication traffic (in yellow) coming in from Node A which I expected, but I didn't expect Node B data to start receiving traffic immediately (in yellow on 10.10.31.12) and then replicating it back to Node A before Node B had even caught up! All 3 of my installs were corrupted and I had to trash them. I did restart them with both A & B nodes up and everything worked fine. The problem I believe is that the incoming data should NOT have started coming into the data nic 10.10.31.12 until AFTER Node B had gotten caught up with replication data from Node A. On the graph the top picture is the dedicated replication link and the bottom picture is the data connection. Yellow is incoming data and red is data being sent to Node A. Here is how to replicate test.
1) have both A & B nodes up and connected to xenserver
2) start a few VM installs
3) down one of your istorage nodes for at least 5 minutes, keep installing os's on vm's
4) bring back up the downed istorage node
5) attempt to finish your installs and they will be corrupted
Thanks!
- Attachments
-
- replication.JPG (88.89 KiB) Viewed 51011 times
-
- Posts: 205
- Joined: Thu Nov 05, 2009 5:52 pm
- Contact:
Re: HA issue that concerns me
We will analyze the problems and give you result ASAP.
KernSafe Support Team
iSCSI SAN, iSCSI Target, iSCSI initiator and related technological support.
[email protected]
iSCSI SAN, iSCSI Target, iSCSI initiator and related technological support.
[email protected]
Re: HA issue that concerns me
Any response to this as of yet?
Re: HA issue that concerns me
Madbenny,
I did get a response from the lead programmer. He said a new version that fixes this issue was in testing and should be released after QC testing. That was about a month ago.
Good question: Has it been fixed yet Olivia or Charles?
I did get a response from the lead programmer. He said a new version that fixes this issue was in testing and should be released after QC testing. That was about a month ago.
Good question: Has it been fixed yet Olivia or Charles?
Re: HA issue that concerns me
Thanks for the quick response, I am concerned now since I have around 50 VM's in a HA enviroment.
Re: HA issue that concerns me
Madbenny wrote:Thanks for the quick response, I am concerned now since I have around 50 VM's in a HA enviroment.
Are you using Xenserver?
Have you ever had an outage with either server or has everything stayed up and running since you set it up?
Thanks!
Re: HA issue that concerns me
I'm using Hyper-V and been using iStorage for about 2 years now with no power outages but every couple months restarting the iStorage server with no corruption, even with the current version. I currently DO NOT have any SQL, AD, or Exchange Servers on the HA config.
Re: HA issue that concerns me
Sorry, forgot to add that everything worked as planned.webguyz wrote:Madbenny wrote:Thanks for the quick response, I am concerned now since I have around 50 VM's in a HA enviroment.
Are you using Xenserver?
Have you ever had an outage with either server or has everything stayed up and running since you set it up?
Thanks!
Re: HA issue that concerns me
I did get a response from Kernsafe about this and they had me download a new version of the HA piece. So far it looks very good.
Testing using 2 windows 2008 R2 server for my HA pair and my test environment5 is using five Windows 2008 R2 installs on Xenserver 6.1
Have been torturing my 2 iStorage HA test servers by randomly shutting down one or the other while doing WIndows Update as well as huge FTP downloads while one of the two HA servers was down and it just keeps on working. I also periodically do a CHKDSK on a random VM server to make sure the virtual disk is intact. So far so good.
Testing using 2 windows 2008 R2 server for my HA pair and my test environment5 is using five Windows 2008 R2 installs on Xenserver 6.1
Have been torturing my 2 iStorage HA test servers by randomly shutting down one or the other while doing WIndows Update as well as huge FTP downloads while one of the two HA servers was down and it just keeps on working. I also periodically do a CHKDSK on a random VM server to make sure the virtual disk is intact. So far so good.