$ pip3 install -U chaostoolkit chaostoolkit-aws $ chaos --version chaos, version 1.3.0
import java.io.*;
import java.net.*;
import java.lang.Thread;
public class EchoServer {
public static void main (String[] args) {
try {
ServerSocket server = new ServerSocket(5566);
System.out.println("Listing on port 5566...");
while (true) {
Socket client = server.accept();
EchoHandler handler = new EchoHandler(client);
handler.start();
}
}
catch (Exception e) {
System.err.println("Exception caught:" + e);
}
}
}
class EchoHandler extends Thread {
Socket client;
InetAddress inetAddress;
String configServer;
EchoHandler (Socket client) {
this.client = client;
try {
inetAddress = InetAddress.getLocalHost();
} catch(Exception e) {}
}
public void run () {
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(client.getInputStream()));
PrintWriter writer = new PrintWriter(client.getOutputStream(), true);
System.out.println("[connected] " + client.getInetAddress());
while (true) {
String line = reader.readLine();
System.out.println(client.getInetAddress() + " says " + line);
writer.println(line);
if (line.trim().equals("bye")) {
break;
}
}
}
catch (Exception e) {
System.err.println("Exception caught: client disconnected.");
}
finally {
try {
System.out.println("[disconnected] " + client.getInetAddress());
client.close();
}
catch (Exception e ){ ; }
}
}
}
{
"version": "1.0.0",
"title": "Validating High-Availability of the Echo server.",
"description": "Ensure that it will be always an EC2 instance with the ElasticIP attached.",
"tags": [
"echoserver",
"keepalived",
"elasticip",
"ha",
"master",
"backup"
],
"configuration": {
"aws_region": "us-east-1"
},
"steady-state-hypothesis": {
"title": "The backup node will take the role of master when the original one is terminated.",
"probes": [
{
"type": "probe",
"name": "port-5566-is-listening-and-working-properly",
"tolerance": true,
"provider": {
"type": "python",
"module": "chaospythian.echo.probes",
"func": "echoserver",
"arguments": {
"tcp_ip": "54.71.185.10",
"tcp_port": 5566,
"message": "testing text\n"
}
}
},
{
"type": "probe",
"name": "master-and-backup-instances-up-and-running",
"tolerance": [1,2],
"provider": {
"type": "python",
"module": "chaosaws.ec2.probes",
"func": "count_instances",
"arguments": {
"filters": [
{
"Name": "instance-state-name",
"Values": ["running"]
},
{
"Name": "tag:Name",
"Values": ["production-echoserver"]
}
]
}
}
},
{
"type": "probe",
"name": "one-ec2-instance-tagged-as-master",
"tolerance": 1,
"provider": {
"type": "python",
"module": "chaosaws.ec2.probes",
"func": "count_instances",
"arguments": {
"filters": [
{
"Name": "instance-state-name",
"Values": ["running"]
},
{
"Name": "tag:State",
"Values": ["MASTER"]
},
{
"Name": "tag:Name",
"Values": ["production-echoserver"]
}
]
}
}
},
{
"type": "probe",
"name": "one-ec2-instance-tagged-as-backup",
"tolerance": [0,1],
"provider": {
"type": "python",
"module": "chaosaws.ec2.probes",
"func": "count_instances",
"arguments": {
"filters": [
{
"Name": "instance-state-name",
"Values": ["running"]
},
{
"Name": "tag:State",
"Values": ["BACKUP"]
},
{
"Name": "tag:Name",
"Values": ["production-echoserver"]
}
]
}
}
}
]
},
"method": [
{
"type": "action",
"name": "terminate-master-node",
"provider": {
"type": "python",
"module": "chaosaws.ec2.actions",
"func": "terminate_instance",
"arguments": {
"filters": [
{
"Name": "instance-state-name",
"Values": ["running"]
},
{
"Name": "tag:State",
"Values": ["MASTER"]
},
{
"Name": "tag:Name",
"Values": ["production-echoserver"]
}
]
}
},
"pauses": {
"after": 10
}
}
]
}
This experiment will probe if the service is up and running, listening on port 5566 and working as expected; if a client connects and sends a string, it will receive the same string back :
$ nc 54.71.185.10 5566 hello hello test test one two three one two three bye bye
Listing on port 5566... [connected] /190.104.119.195 /190.104.119.195 says hello /190.104.119.195 says test /190.104.119.195 says one two three /190.104.119.195 says bye [disconnected] /190.104.119.195Then, it will check the EC2 instance tags looking for one and only one MASTER node and zero or one SLAVE nodes. It's okay that we don't have any slave nodes, because if the master node dies, the other one will take the lead and the service will be up and running from the client's point of view. It will take a few seconds to launch a new server and this will become the new slave node. If the slave node dies, no problem, our autoscaling group will launch a new one and the Elastic IP was never changed from the master node.
# -*- coding: utf-8 -*-
import socket
__all__ = ["echoserver"]
def echoserver(tcp_ip, tcp_port, message):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((tcp_ip, tcp_port))
s.send(message.encode('ascii'))
data = s.recv(1024)
s.send("bye".encode('ascii'))
s.close()
return (data.decode('ascii').strip() == message.strip())
Okay, let's continue. The following part of the Chaos Toolkit experiment is the method, where we are terminating the EC2 instance. If our KeepAliveD configuration works, it will associate the Elastic IP with the slave node and tag it as the new master. It won't be noticeable to our clients. When the hypothesis runs for a second time, it will test if the service is up and running, if there is one master node and if there are zero or one slave nodes one more time. If it fails, then we have found a weakness in our platform and we will have to fix it. You can probe anything using the Chaos Toolkit, not only what happens if one server goes down but also certificate validations, networking connectivity, queries to database servers, cache servers, etc.
Ready to optimize your Database for the future?