Closed
Description
- Use git to count the objects in a repo:
$ git rev-list --objects --all | wc -l
6990030
$ git rev-list --objects --all | sort | uniq | wc -l
6990030
- Parse the output from
git rev-list --objects --all
, fetch each object with name_to_object, and count each type:
Commits: 909667, Tags: 2469, Trees: 4178263, Blobs: 1899631
- Query what is ostensibly the same information using git-python:
import argparse
import pathlib
import git
def log(testname, a, b):
print(testname, ':', a, b)
def main():
parser = argparse.ArgumentParser(description='Git x ref.')
parser.add_argument('repository', metavar='repository', type=pathlib.Path,
help='Path to Git repository.')
args = parser.parse_args()
repos = [
git.Repo(str(args.repository), odbt=git.GitCmdObjectDB),
git.Repo(str(args.repository), odbt=git.GitDB)
]
log('size()', *[r.odb.size() for r in repos])
log('len(sha_iter())', *[sum(1 for x in r.odb.sha_iter()) for r in repos])
log('len(iter_trees())', *[sum(1 for x in r.iter_trees()) for r in repos])
if __name__ == '__main__':
main()
Result:
size() : 3839 8268978
len(sha_iter()) : 3839 8268978
len(iter_trees()) : 568851 568851
So:
Git thinks there are 6,990,030 objects in the database.
GitDB thinks there are 8,268,978.
GitCmdObjectDB thinks there are 3,839.
Git thinks there are 4,178,263 trees in the database.
Both GitDB and GitCmdObjectDB think there are 568,851.