why is the 137 exit code?

TLDR: Exit code 137 is caused by oneview-django-1 container exited

  1. mostly likely caused by migration file in the PR conflicting with migration files in master branch.
  2. if there is no migration, it could be some other error causing oneview-django-1 exits, such as TypeError: check_permissions() got an unexpected keyword argument 'groups'

Screenshot 2023-08-16 at 18 26 41

First of all, Exit code 137 doesn’t necessarily mean out of memory issue.

Based on Advanced Bash-Scripting Guide Exit Code, in the example of 128 + n , 137 exit code means a process receives Fatal Error Signal 9, i.e. kill -9, from man kill, kill with signal 9 is KILL (non-catchable, non-ignorable kill).

The actual reason of exit code 137

This PR, contains the minimal changes to demonstrate/reproduce the exit code 137, the changes are 1. added migration file 2. in github action, use sleep 999 to replace other actions such as install dev dependencies, lint, tests …

In the github action, we use actions/checkout@v3, which under the hood, checkout a merge PR with origin/master branch.

In this PR, we have a migration file 0185_task_retain_regarding.py, and in master branch we have additional migration files 0185_auto_20230814_1357.py and 0186_providercontact_status.py, because github action checkout creates the merge PR, then in the github action running environment, we have all three of these migration files.

Because of these conflicting migrations files, the oneview-django-1 container constantly exits with error CommandError: Conflicting migrations detected; multiple leaf nodes in the migration graph: (0185_task_retain_regarding, 0186_providercontact_status in oneview).

Screenshot 2023-08-16 at 18 20 21

Then in the bash sleep step docker compose --file docker-compose-dev.yml exec django sleep 999, the /bin/sleep is running within the container, but the container gets killed, then this sleep process exited with 137.

To confirm the above finding is correct, I managed to reproduce the exit code 137 locally. To reproduce, you can run make run-be in the first terminal window, and run make bash in the second terminal window, then you quick input 3 in the first terminal to force kill the oneview-django-1 container, the container will get a SIGKILL i.e. kill -9 that’s why the process exited with exit code 137.

Screenshot 2023-08-16 at 18 14 50

Bonus Issues While investigating

the celery container is constantly exiting too… something we might need to fix in the future, though our tests don’t rely on celery

Screenshot 2023-08-16 at 18 20 46