Skip to content

Explanation makes unclear reference to "first" and "second" imports #1804

Closed
@EliahKagan

Description

@EliahKagan

The git.cmd.Git.refresh class method includes this code:

GitPython/git/cmd.py

Lines 478 to 481 in d28c20b

# We get here if this was the init refresh and the refresh mode was not
# error. Go ahead and set the GIT_PYTHON_GIT_EXECUTABLE such that we
# discern the difference between a first import and a second import.
cls.GIT_PYTHON_GIT_EXECUTABLE = cls.git_exec_name

It is unclear what "the difference between a first import and a second import" is referring to. (I should have noticed this issue when working on #1725, in which I actually edited that comment to improve its formatting, capitalization, and punctuation, but I apparently did not notice it at that time.)

That code can run multiple times, accessing the same class-wide state each time, because Git.refresh can be called multiple times (and, being public, may reasonably be called by code outside of GitPython). But this is not related to how many imports, or import attempts, occur. The comment would make sense if Python's import system worked differently, but seems at odds with the actual design. As I understand it:

  • If an import succeeds, the resulting module object is cached in sys.modules. Subsequent imports use the same module object and do not rerun the module's code.

    For example, suppose hello.py contains:

    print("Hello, world!")

    Then running import hello from a REPL prints the greeting the first time it is run, but running it subsequent times produces no repeated side effect.

  • If an import fails, the new module object, which may itself be incompletely populated, is removed from sys.modules, before control returns via the raised exception propagating up the call stack to the code that attempted the import. Even if that module object turns out still to exist, it will not be reused in subsequent attempts to import the code. A subsequent import attempt will rerun the module's code from scratch, creating a new module object which serves as a new namespace. The state in the old module object is not used; global variables of the module, including classes, and thus also including their attributes, are all new.

    For example, suppose throw.py contains:

    import sys
    
    me = sys.modules[__name__]
    try:
        me.x += 1
    except AttributeError:
        me.x = 1
    
    try:
        a = sys.a
    except AttributeError:
        a = sys.a = []
    a.append(me)
    
    raise ValueError(x)

    Then running import throw from a REPL raises ValueError with a value of 1 each time it is run, and sys.a is populated with the abandoned module object from each attempt, all of which are separate objects. (Importing sys in the REPL allows it to be inspected, since import sys succeeds the first time it occurs, whether that is in the REPL or in throw.py, and thus all import sys get the same sys module object with the same a attribute.)

  • If the contents of sys.modules are explicitly modified (rather than implicitly, due to the effect of import and from statements), one of the above two situations still applies, depending on the nature of the mutation. The exception to this would be if a reference to the old module object is obtained and patched back into sys.modules. That would be a very unusual thing to do, except perhaps for purposes of debugging or experimentation, and I'm reasonably sure the intent of that comment--and the code it documents--is not to cover such cases. Furthermore, that would still not rerun top-level code of the module.

  • Python's import system is extensible and can be made to work differently from the usual rules. Of course, this cannot generally be anticipated by library code, and I likewise doubt this is a factor in the design or documentation.

I suspect that the code is correct and the comment incorrect, but since I'm not sure what was originally intended, I am not entirely sure how to fix it. It could be changed to say something like "discern the difference between the initial refresh at import time and subsequent calls," but this is very general and I worry it would paper over whatever really ought to be said. On the other hand, perhaps this really is all that was meant.

I haven't looked into the history of the Python import system but I don't think it operated in a fundamentally different way at the time this explanation was written. This code, and the original accompanying comment, was added in #640 (in 2b3e769). The reasoning given in that pull request, for the actual code change, continues to makes sense, and does not seem to have relied on any incorrect--nor any now-outdated--assumptions about the import system. As far as I can tell, it is only the comment that should be improved.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions