Description
Running is_dirty()
on my repo takes 5 minutes because it's a large repo, has text conversion enabled for diffs, and is_dirty()
is outputting a full diff. is_dirty()
should be a relatively simple operation, but since it uses git diff
instead of a plumbing command like git diff-files
it incurs the cost of displaying nice output for users.
The diff.astextplain.textconv
git option converts pdf files to text before diffing. This option appears to come with msys git. It's useful when diffing interactively, but a lot of overhead when just checking for dirty state.
GitPython doesn't look at the output of the diff, it just checks that it's not empty:
Lines 957 to 977 in 3470fb3
If we switch from the diff
to diff-index
, we can see that it's comparable in speed to turning off the text conversions:
$ time (git diff --abbrev=40 --full-index --raw | cat)
...snip...
:100644 100644 66130ffa9aa8b4bc98a1918a946919f94c9a819d 0000000000000000000000000000000000000000 M src/thread.cpp
real 5m14.842s
user 1m6.922s
sys 4m20.000s
$ time (git diff-index --quiet HEAD -- && echo "clean" || echo "dirty")
real 0m4.517s
user 0m0.203s
sys 0m32.281s
$ echo "*.pdf -diff=astextplain" > .gitattributes
$ time (git diff --abbrev=40 --full-index --raw | cat)
...snip...
:100644 100644 66130ffa9aa8b4bc98a1918a946919f94c9a819d 0000000000000000000000000000000000000000 M src/thread.cpp
real 0m4.394s
user 0m0.109s
sys 0m32.250s
(These timings are all after running these command several times. When I first ran git diff it took 26 minutes!)
Workaround
Add a .gitattributes that disables the text conversion:
*.pdf -diff=astextplain
Solution
I think is_dirty should instead use diff-files
and diff-index
. This answer looks like a good explanation of how they work.
Here's my rough replacement for is_dirty:
import git
def is_dirty(
repo,
index: bool = True,
working_tree: bool = True,
untracked_files: bool = False,
submodules: bool = True,
path=None,
):
default_args = []
if submodules:
default_args.append("--ignore-submodules")
if index:
try:
# Always want to end with -- (even without path).
args = default_args + ["--quiet", "--cached", "HEAD", "--", path]
repo.git.diff_index(*args)
except git.exc.GitCommandError:
return True
if working_tree:
try:
args = default_args + ["--quiet", "--", path]
repo.git.diff_files(*args)
except git.exc.GitCommandError:
return True
if untracked_files:
if len(repo._get_untracked_files(path, ignore_submodules=not submodules)):
return True
return False
repo = git.Repo.init("c:/code/project")
print("is_dirty", is_dirty(repo))
I'll try to find time to make a proper patch if that sounds good.