Based on the discussions in the comments I came up w/ the following solution using GitPython (only required code is put here and ignored the rest to avoid confusion)
import git
from git import Repo
from git import RemoteProgress
class MyProgressPrinter(RemoteProgress):
def update(op_code, cur_count, max_count=None, message=''):
print(op_code, cur_count, max_count, cur_count / (max_count or 100.0), message or "NO MESSAGE")
def _get_commits_info(self):
for fetch_info in self.repo.remotes.origin.fetch(progress=MyProgressPrinter()):
self.commits_info.append(
(fetch_info.commit.committed_date, fetch_info.commit))
self.commits_info = sorted(self.commits_info, key=lambda x: x[0]) #sort based on committed date
def _get_the_changed_components(self):
self._get_commits_info()
last_date = self.commits_info[-1][0]
last_commit = self.commits_info[-1][1]
since_date = last_date - self.time_period * 86400 # for example time_period is 7 (days)
since_commit = self._get_since_commit(since_date) # finds the since_commit from the sorted list of commits_info
for item in last_commit.diff(since_commit):
if item.a_path.find('certain_path') != -1:
self.paths.add(item.a_path) #self.path is a set()
However, the length of self.path
is not reasonable to me since it captures too many changes and I am not sure why. So basically, what I did is: found all the commits, sort them based on committed_date
and then found a commit (since_commit
in the code) where its committed_date
is for 7 days ago
. After that got the diff between the last commit
in the sorted commits_info
list and the since_commit
then saved the a_path
es into a set.
I also tried another way and got the diff between every two consecutive commits since since_commit
from the sorted commits_info
all the way up to the last commit. This way the number of changes is even higher.
Any comments or help? Or do you think it is the correct way of getting diff for a time period? and the reason that the number of changes is higher is just by accident?
UPDATE and FINAL SOLUTION
So it seems comparing (diff) two commits doesn't give the changes that have happened between now and sometimes ago because commits before merging may include the changes before the interested time period. For that, I found two solutions, first count the number of HEAD
changes since that time till the current date, which is not very accurate. For that we can use:
g = Git(self.repo_directory)
loginfo = g.log('--since={}'.format(since), '--pretty=tformat:')
Then count the number of Merge pull request
string which basically counts the number of times that merging has happened to the repo, which usually changes the HEAD
. However, it's not accurate but let's assume this count will be 31. Then:
for item in self.repo.head.commit.diff('develop~31'):
if item.a_path.find('certain_path') != -1:
self.paths.add(item.a_path) #self.path is a set()
The solution that works and is straight forward
def _get_the_changed_components(self):
g = Git(self.repo_directory)
today = date.today()
since = today - DT.timedelta(self.time_period) #some times ago
loginfo = g.log('--since={}'.format(since), '--pretty=tformat:', '--name-only')
files = loginfo.split('\n')
for file in files:
self.paths.add(file)
ref@{reflog-selector}
is just a way of specifying one particular commit hash. The reflog selector chooses how Git looks at the reflog for the given ref—the one preceding the@
—and picks out one of its values. Usegit reflog <ref>
to show the reflog for that ref: the hash ID your reflog expression picks will be one of those hash IDs.git log --pretty=format:%H
orgit rev-list
to get the correct hash IDs.) Not that relative is wrong either, just that if you make a new commit, suddenly what was~10
is now~11
. Or, if your commit DAG is quite branchy, you might needHEAD~3^2~2^2~4^2
or something crazy like that just to get there using relative motions. If you're writing code, this is absurd: the compiler can remember hash IDs!develop
points now" isdevelop~10
, notdevelop@{HEAD~10}
. See the gitrevisions documentation. Remember that for anything involving any historical commit, any name is just a method by which we have Git find a commit's hash ID.