I use the following python code to extract the diff (the hunks) between two commits.
from git import Repo
!git clone https://github.com/apache/commons-math.git
repo = Repo("/content/commons-math")
file_path = 'commons-math-legacy/src/test/java/org/apache/commons/math4/legacy/distribution/EmpiricalDistributionTest.java'
parent = 'd080f0d8251d58728024955764a5c0c75acf8277'
commit = '9d1741bfe4a7808cfa0c313891a717adf98a3087'
hunks = repo.git.diff(parent, commit, file_path, ignore_blank_lines=True, ignore_space_at_eol=True)
The hunks show that the specified file is a new file that is created by adding 689 lines:
diff --git a/commons-math-legacy/src/test/java/org/apache/commons/math4/legacy/distribution/EmpiricalDistributionTest.java b/commons-math-legacy/src/test/java/org/apache/commons/math4/legacy/distribution/EmpiricalDistributionTest.java
new file mode 100644
index 000000000..dfdfdd946
--- /dev/null
+++ b/commons-math-legacy/src/test/java/org/apache/commons/math4/legacy/distribution/EmpiricalDistributionTest.java
@@ -0,0 +1,689 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
.
.
.
But, when I open the corresponding GitHub commit page, and check the details for EmpiricalDistributionTest.java
, it shows that this file is renamed (the containing folder is changed) and a few lines are updated. My first question is why the results from GitPython doesn't match to the GitHub interface? And the second question is how could I configure GitPython to get the same results as GitHub website interface?
I found that this problem happens when a file is moved to another folder and the content of the file changes in that commit. In java projects, when the containing package of a class changes, the folder names and the content of the file changes. But, I have no idea why GitPython cannot detect this situation as an update on an existing file. Thanks in advance for your help.