TLDR: Exit code 137 is caused by oneview-django-1
container exited
oneview-django-1
exits, such as
TypeError: check_permissions() got an unexpected keyword argument 'groups'
First of all, Exit code 137 doesn’t necessarily mean out of memory issue.
Based on Advanced
Bash-Scripting Guide Exit Code, in the example of 128 + n , 137 exit
code means a process receives Fatal Error Signal 9,
i.e. kill -9
, from man kill
, kill with signal
9 is KILL (non-catchable, non-ignorable kill)
.
The actual reason of exit code 137
This PR, contains the minimal changes to demonstrate/reproduce the
exit code 137, the changes are 1. added migration file 2. in github
action, use sleep 999
to replace other actions such as
install dev dependencies, lint, tests …
In the github action, we use actions/checkout@v3,
which under the hood, checkout a merge PR with
origin/master
branch.
In this PR, we have a migration file
0185_task_retain_regarding.py
, and in master branch we have
additional migration files 0185_auto_20230814_1357.py
and
0186_providercontact_status.py
, because github action
checkout creates the merge PR, then in the github action running
environment, we have all three of these migration files.
Because of these conflicting migrations files, the
oneview-django-1
container constantly exits with error
CommandError: Conflicting migrations detected; multiple leaf nodes in the migration graph: (0185_task_retain_regarding, 0186_providercontact_status in oneview).
Then in the bash sleep step
docker compose --file docker-compose-dev.yml exec django sleep 999
,
the /bin/sleep
is running within the container, but the
container gets killed, then this sleep process exited with 137.
To confirm the above finding is correct, I managed to reproduce the
exit code 137 locally. To reproduce, you can run
make run-be
in the first terminal window, and run
make bash
in the second terminal window, then you quick
input 3 SIGKILL
i.e. kill -9
that’s why the process
exited with exit code 137.
Bonus Issues While investigating
the celery container is constantly exiting too… something we might need to fix in the future, though our tests don’t rely on celery