4

I am using GitPython to find the changed file for a certain period of time (for example now and 1 week ago):

 repo = Repo(self.repo_directory)
 for item in repo.head.commit.diff('develop@{1 weeks ago}'):
     print ("smth") 

but nothing happens even by changing the number of weeks to different number, which means there is no diff detected for that time period. If I change 'develop@{1 weeks ago}' to 'HEAD@{1 weeks ago}' then the number of changes is huge which is not correct for a week. Any help is appreciated.

9
  • Remember that ref@{reflog-selector} is just a way of specifying one particular commit hash. The reflog selector chooses how Git looks at the reflog for the given ref—the one preceding the @—and picks out one of its values. Use git reflog <ref> to show the reflog for that ref: the hash ID your reflog expression picks will be one of those hash IDs.
    – torek
    Commented Dec 18, 2019 at 6:20
  • Really, look at your reflogs. Your reflogs are yours —they reflect the activity you did in your repository. They do not reflect activity in any other repository! If someone changed some file in their master ten years ago, and you connected your Git to their Git yesterday and got their new commits yesterday, your reflogs will say "this happened yesterday".
    – torek
    Commented Dec 18, 2019 at 6:22
  • Thanks Torek, So what do you suggest to find the list of changed file (add, delete,...) in a repo? using GitPython or some other Python packages if you're aware of?
    – Alan
    Commented Dec 18, 2019 at 6:24
  • 1
    Perhaps close enough, yes. But you may want to work with raw hash IDs: they identify an exact commit, and don't rely on relative traversal of the graph. (Use git log --pretty=format:%H or git rev-list to get the correct hash IDs.) Not that relative is wrong either, just that if you make a new commit, suddenly what was ~10 is now ~11. Or, if your commit DAG is quite branchy, you might need HEAD~3^2~2^2~4^2 or something crazy like that just to get there using relative motions. If you're writing code, this is absurd: the compiler can remember hash IDs!
    – torek
    Commented Dec 18, 2019 at 6:42
  • 1
    The syntax for "ten first-commit steps back from where my own branch name develop points now" is develop~10, not develop@{HEAD~10}. See the gitrevisions documentation. Remember that for anything involving any historical commit, any name is just a method by which we have Git find a commit's hash ID.
    – torek
    Commented Dec 18, 2019 at 6:43

2 Answers 2

1

develop@{1 weeks ago} would use the reflog

Reference logs, or "reflogs", record when the tips of branches and other references were updated in the local repository.

That means your local Git repository might not have recorded locally any operation on develop a week ago, while it has recorded anything happening to "HEAD".

If develop was changed remotely and then its history imported locally, develop@{1 weeks ago} might not yield anything (because your local reflog would not reference it).

Only git log --since/--until would operate on any date (not just the ones recorded in reflog, which are limited to local operation, and to, by default, 90 days)

But I don't know if GitPython implements that.
Its git.refs.log module is more based on reflogs entry, which is not helpful in your case.

6
  • if i change the @{1 week ago} to @{10 weeks ago} still give the same result (nothing). So what do you suggest to get all the changes (changed files, deleted, ...) in develop for for example 1 week ago? using GitPython? using HEAD@{1 week ago} also doesn't seem to be correct because the number of changes is huge for the last week in my example which is not reasonable.
    – Alan
    Commented Dec 18, 2019 at 5:09
  • @Alan What I suggested is that the "week ago" notation only works for work you have done locally, not for history imported from a remote repository.
    – VonC
    Commented Dec 18, 2019 at 5:10
  • I see, so how can I track remote repository?
    – Alan
    Commented Dec 18, 2019 at 5:18
  • @Alan POssibly by providing an exact date instead of relying on the "week ago" notation, which in itself, won't always be available.
    – VonC
    Commented Dec 18, 2019 at 5:24
  • I used repo.head.commit.diff('develop@{1 Dec 2019}}'): and changed the date from 16th Dec all the way down to 1st Dec but still no output.
    – Alan
    Commented Dec 18, 2019 at 5:29
1

Based on the discussions in the comments I came up w/ the following solution using GitPython (only required code is put here and ignored the rest to avoid confusion)

   import git
   from git import Repo
   from git import RemoteProgress

   class MyProgressPrinter(RemoteProgress):
       def update(op_code, cur_count, max_count=None, message=''):
          print(op_code, cur_count, max_count, cur_count / (max_count or 100.0), message or "NO MESSAGE")


   def _get_commits_info(self):
        for fetch_info in self.repo.remotes.origin.fetch(progress=MyProgressPrinter()):
        self.commits_info.append(
            (fetch_info.commit.committed_date, fetch_info.commit))  
        self.commits_info = sorted(self.commits_info, key=lambda x: x[0]) #sort based on committed date


   def _get_the_changed_components(self):
       self._get_commits_info()
       last_date = self.commits_info[-1][0]
       last_commit = self.commits_info[-1][1]
       since_date = last_date - self.time_period * 86400 # for example time_period is 7 (days)
       since_commit = self._get_since_commit(since_date) # finds the since_commit from the sorted list of commits_info 

       for item in last_commit.diff(since_commit):
           if item.a_path.find('certain_path') != -1:
               self.paths.add(item.a_path) #self.path is a set()

However, the length of self.path is not reasonable to me since it captures too many changes and I am not sure why. So basically, what I did is: found all the commits, sort them based on committed_date and then found a commit (since_commit in the code) where its committed_date is for 7 days ago. After that got the diff between the last commit in the sorted commits_info list and the since_commit then saved the a_pathes into a set.

I also tried another way and got the diff between every two consecutive commits since since_commit from the sorted commits_info all the way up to the last commit. This way the number of changes is even higher.

Any comments or help? Or do you think it is the correct way of getting diff for a time period? and the reason that the number of changes is higher is just by accident?

UPDATE and FINAL SOLUTION

So it seems comparing (diff) two commits doesn't give the changes that have happened between now and sometimes ago because commits before merging may include the changes before the interested time period. For that, I found two solutions, first count the number of HEAD changes since that time till the current date, which is not very accurate. For that we can use:

 g = Git(self.repo_directory)
 loginfo = g.log('--since={}'.format(since), '--pretty=tformat:') 

Then count the number of Merge pull request string which basically counts the number of times that merging has happened to the repo, which usually changes the HEAD. However, it's not accurate but let's assume this count will be 31. Then:

  for item in self.repo.head.commit.diff('develop~31'):
     if item.a_path.find('certain_path') != -1:
         self.paths.add(item.a_path) #self.path is a set()

The solution that works and is straight forward

  def _get_the_changed_components(self):
      g = Git(self.repo_directory)
      today = date.today()
      since = today - DT.timedelta(self.time_period) #some times ago
      loginfo = g.log('--since={}'.format(since), '--pretty=tformat:', '--name-only')
      files = loginfo.split('\n')
      for file in files:
          self.paths.add(file)
1
  • Thanks and Git log is supported in GitPython which made it very easy.
    – Alan
    Commented Dec 28, 2019 at 18:22

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.