-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR: LXC container name not set! #1857
Comments
This might be due to the probe-action. You can try changing resource-agents/heartbeat/lxc.in Line 343 in fe1a2f8
to ocf_is_probe || LXC_validate .
|
Seems like the agent already takes care of probe-actions, so I'll have to investigate further what might cause it. |
Hey @oalbrigt , thanks 4 reply!
Yep, ofc i can try, but what the point if as we can see, the OCF_RESKEY_container var isn't exists or the agent just doesn't know anything about it. So even if i'll try it, he wont stop the container here for the same reason resource-agents/heartbeat/lxc.in Line 184 in fe1a2f8
|
@kgaillot Do you know what might cause OCF_RESKEY_ variables not being set when doing |
No, that's odd. Was the command tried without --force first? It shouldn't normally be necessary, so if it was, that might point to an issue. |
Hey @kgaillot , thx 4 reply! |
Well, i can try if you tell me how to do that and if i find cluster in the same state. |
Something like |
Oh, you mean i should place |
That sounds right |
Hey guyz! I got it. Tried to stop container
And this how it should looks like
As you can see, there miss some variables like Any ideas? ^_^ |
That's strange. Did you create it without specifying |
Yes, it's very, VERY strange. I create resources with
As you can see, almost a year has passed before the bug appeared. This means, i can create resource with ANY method and it WILL work correctly until... something goes wrong.
Soo-o-o-o, i have no idea how to debug it further :( |
Can you add the output from |
Yep, sure, but i have it on debian:
|
@iglov That is extremely odd. If you still have the logs from when that occurred, can you open a bug at bugs.clusterlabs.org and attach the output of |
I would like to, but i can't, cuz there is a lot of business sensitive information like hostnames, common logs, processlist, even drbd passwords :( |
It would be helpful to at least get the scheduler input that led to the problem. At the time the problem occurred, one of the nodes was the designated controller (DC). It will have a log message like "Calculated transition ... saving inputs in ...". The last message before the problem occurred is the interesting one, and the file name is the input. You can uncompress it and edit out any sensitive information, then email it to [email protected]. |
Alternatively you can investigate the file yourself. I'd start with checking the resource configuration and make sure the resource parameters are set correctly there. If they're not, someone or something likely modified the configuration. If they are, the next thing I'd try is |
Hey @kgaillot ! Thanks 4 explanations and ur time!
looks good, isn't it? I don't see anything wrong here. But if you still want, i can try to sent you these pe-input files. |
No, something's wrong. The resource parameters should be listed in |
Yep, sry, u right, my bad. I tried to find resource nsa-1.ny in pe-input-250 (this one is the last before fuckup) and there is no that primitive there at all. But it is in pe-input-249. Pooof, it's just disappeared... |
OS: Debian 11 (And debian 10)
Kernel: 5.10.0-15-amd64
Env: resource-agents 1:4.7.0-1~bpo10+1, pacemaker 2.0.5-2, corosync 3.1.2-2, lxc 1:4.0.6-2
Just trying to add new resource
After ~5min want to remove it
pcs resource remove front-2.fr --force
got an error and cluster starts to migrate
Mar 29 23:28:51 cse2.fr lxc(front-2.fr)[2103391]: ERROR: LXC container name not set!
as i can see in
/usr/lib/ocf/resource.d/heartbeat/lxc
the error spawns when agent can't getOCF_RESKEY_container
variable.This bug is only on clusters who work without reboot a long time. For example after fencing i can add/remove lxc resources and everything will be fine for a while.
The question is: why? And how to debug it?
The text was updated successfully, but these errors were encountered: