Skip to content

Library leaves dangling processes after use #1333

Open
@zhiltsov-max

Description

@zhiltsov-max

Hi, I've faced a problem on Windows with file/directory removal with shutil.rmtree(). The problem, by itself, is not new and already has known solutions, including git.util.rmtree(). However, just using this function was not enough in my case and I started to dig deeper - there was an error about an open file handle. I discovered there were extra child git processes hanging in the process tree, but, unexpectedly, it was happening after all the library objects were removed:
image

Then I tried to catch and trace Popen calls from the library and found out this function:

GitPython/git/cmd.py

Lines 1190 to 1199 in 5b3669e

def get_object_header(self, ref: str) -> Tuple[str, str, int]:
""" Use this method to quickly examine the type and size of the object behind
the given ref.
:note: The method will only suffer from the costs of command invocation
once and reuses the command in subsequent calls.
:return: (hexsha, type_string, size_as_int)"""
cmd = self._get_persistent_cmd("cat_file_header", "cat_file", batch_check=True)
return self.__get_object_header(cmd, ref)

From comments and names, it looks like such persistent behavior is intended. There is also the AutoInterrupt class here:

GitPython/git/cmd.py

Lines 367 to 373 in 5b3669e

class AutoInterrupt(object):
"""Kill/Interrupt the stored process instance once this instance goes out of scope. It is
used to prevent processes piling up in case iterators stop reading.
Besides all attributes are wired through to the contained process object.
The wait method was overridden to perform automatic status code checking
and possibly raise."""

returned from:

return self.AutoInterrupt(proc, command)

The class comment indicates that the process should be killed when the object goes out of scope. To me, it looks like an attempt to imitate the RAII C++ idiom. Unfortunately, in Python it does not work this way because of garbage collector and the scoping rules. That is, an object that has no references can be removed in any time after it lost its last reference. The "pythonic" way to deal with this behavior is using a context manager (which this class is not).

My suggestion is to collect these processes and manage them on a library/repo level, or wrap this in a context manager (which is better than just calling wait() manually in case of exceptions). BTW, probably, the process created is not reused after the first call.

import git
r = git.Repo.init('test_repo') 
r.index.commit('aa')   
del r

Upd: I see there is an undocumented repo.close(), which can help, but can have undesired side-effects because of a forced gc call.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions