HA issue that concerns me
Posted: Tue Jan 29, 2013 7:06 am
I believe there is a big problem in the replication in HA for Xenserver. I was doing extensive testing with 2 istorage nodes (Node A & Node B) and I have sent the task manager graph from Node B after it came back up after having been shutdown by me as a test.
I had started my test with both nodes being up. I spun up 3 VM's (2 linux and 1 windows) and had started installs on all 3 OS's. I then shutdown Node 2 (stopped iStorage service) for a few minutes and all traffic was going to Node A and working fine. I then restarted Node B. I saw the incoming replication traffic (in yellow) coming in from Node A which I expected, but I didn't expect Node B data to start receiving traffic immediately (in yellow on 10.10.31.12) and then replicating it back to Node A before Node B had even caught up! All 3 of my installs were corrupted and I had to trash them. I did restart them with both A & B nodes up and everything worked fine. The problem I believe is that the incoming data should NOT have started coming into the data nic 10.10.31.12 until AFTER Node B had gotten caught up with replication data from Node A. On the graph the top picture is the dedicated replication link and the bottom picture is the data connection. Yellow is incoming data and red is data being sent to Node A. Here is how to replicate test.
1) have both A & B nodes up and connected to xenserver
2) start a few VM installs
3) down one of your istorage nodes for at least 5 minutes, keep installing os's on vm's
4) bring back up the downed istorage node
5) attempt to finish your installs and they will be corrupted
Thanks!
I had started my test with both nodes being up. I spun up 3 VM's (2 linux and 1 windows) and had started installs on all 3 OS's. I then shutdown Node 2 (stopped iStorage service) for a few minutes and all traffic was going to Node A and working fine. I then restarted Node B. I saw the incoming replication traffic (in yellow) coming in from Node A which I expected, but I didn't expect Node B data to start receiving traffic immediately (in yellow on 10.10.31.12) and then replicating it back to Node A before Node B had even caught up! All 3 of my installs were corrupted and I had to trash them. I did restart them with both A & B nodes up and everything worked fine. The problem I believe is that the incoming data should NOT have started coming into the data nic 10.10.31.12 until AFTER Node B had gotten caught up with replication data from Node A. On the graph the top picture is the dedicated replication link and the bottom picture is the data connection. Yellow is incoming data and red is data being sent to Node A. Here is how to replicate test.
1) have both A & B nodes up and connected to xenserver
2) start a few VM installs
3) down one of your istorage nodes for at least 5 minutes, keep installing os's on vm's
4) bring back up the downed istorage node
5) attempt to finish your installs and they will be corrupted
Thanks!