Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninstall instructions #571

Open
nandanrao opened this issue Jun 21, 2016 · 5 comments
Open

Uninstall instructions #571

nandanrao opened this issue Jun 21, 2016 · 5 comments
Labels

Comments

@nandanrao
Copy link

It's really nice and easy to launch an elasticsearch cluster (in my case, into a DC/OS cluster) with this library. However, it's a little unclear to me how to remove a cluster / uninstall. Could use some mention of this in the docs!

@nandanrao
Copy link
Author

(if the only correct way is to manually /teardown via mesos framework id, i'm happy to make a docs PR, but I suspect maybe there's another way?)

@frankscholten
Copy link
Contributor

@nandanrao Thanks for opening this issue.

Indeed, tearing down via

curl -XPOST $MASTER/teardown -d 'frameworkId=794b66f4-2c4f-45cd-920b-8ee0b3555259-0001'

is the way to do it.

However, when testing this I might have found a bug. I did a teardown, the scheduler and executors were killed as expected and then Marathon restarted the scheduler. But now it did not launch a new executor. Instead the logs contained the following:

[DEBUG] 2016-06-22 12:37:16,061 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler resourceOffers - Declined offer: id { value: "794b66f4-2c4f-45cd-920b-8ee0b3555259-O245" }, framework_id { value: "794b66f4-2c4f-45cd-920b-8ee0b3555259-0001" }, slave_id { value: "794b66f4-2c4f-45cd-920b-8ee0b3555259-S3" }, hostname: "172.17.0.8", resources { name: "ports",  type: RANGES,  ranges {  range {   begin: 37000,    end: 38000,   },  },  role: "*" }, resources { name: "cpus",  type: SCALAR,  scalar {  value: 2.0,  },  role: "*" }, resources { name: "mem",  type: SCALAR,  scalar {  value: 4096.0,  },  role: "*" }, resources { name: "disk",  type: SCALAR,  scalar {  value: 20000.0,  },  role: "*" }, url { scheme: "http",  address {  hostname: "172.17.0.8",   ip: "172.17.0.8",   port: 5051,  },  path: "/slave(1)" }, Reason: Cluster size already fulfilled
[DEBUG] 2016-06-22 12:37:22,042 class org.apache.mesos.elasticsearch.scheduler.ElasticsearchScheduler isHostnameResolveable - Attempting to resolve hostname: 172.17.0.5
[DEBUG] 2016-06-22 12:37:22,047 org.apache.mesos.Protos$TaskStatus <init> - Task status for elasticsearch_172.17.0.6_20160622T123139.777Z exists, using old state: TASK_RUNNING

Cluster size already fullfilled means that it thinks there is one task running based on ZK state even though that executor has been killed.

I am looking into the Mesos code to understand how teardown works at a lower level. We might have to add code to do proper ZK state cleanup on teardown. The question is how to do this and where in the framework. I asked a question on the DC/OS community channel in #general https://dcos-community.slack.com

@nandanrao
Copy link
Author

Yes I saw this as well, and ran the docker cleanup script -- which I believe in this case ONLY removed the zookeeper node which was named after the elasticsearch cluster. That SEEMS to have fixed it, although I did not look very closely.

@philwinder
Copy link
Contributor

Personally, I always viewed shutdown as of secondary importance. Who wants their ES cluster to be destroyed? ;-)

But seriously, in the past I simply stopped the scheduler. There would be some state left in zookeeper, which is required just in case it failed on its own. You can manually delete that, or just ignore it. There's very little in there.

@philwinder
Copy link
Contributor

Oh, and also check out #550. If the scheduler is closed, then the executors are closed, then the scheduler starts, the scheduler will still think that the executors are running, because we never receive any updates from Mesos to tell us that they've gone. An issue with Mesos IMO. But a "ping" mechanism to make sure they are still there would work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants