Skip to content

Git, GitDB, and GitCmdObjectDB give different results when counting objects. #765

Closed
@ali1234

Description

@ali1234
  1. Use git to count the objects in a repo:
$ git rev-list --objects --all | wc -l
6990030
$ git rev-list --objects --all | sort | uniq | wc -l
6990030
  1. Parse the output from git rev-list --objects --all, fetch each object with name_to_object, and count each type:
Commits: 909667, Tags: 2469, Trees: 4178263, Blobs: 1899631
  1. Query what is ostensibly the same information using git-python:
import argparse
import pathlib

import git

def log(testname, a, b):
    print(testname, ':', a, b)

def main():

    parser = argparse.ArgumentParser(description='Git x ref.')
    parser.add_argument('repository', metavar='repository', type=pathlib.Path,
                        help='Path to Git repository.')

    args = parser.parse_args()

    repos = [
        git.Repo(str(args.repository), odbt=git.GitCmdObjectDB),
        git.Repo(str(args.repository), odbt=git.GitDB)
    ]

    log('size()', *[r.odb.size() for r in repos])
    log('len(sha_iter())', *[sum(1 for x in r.odb.sha_iter()) for r in repos])
    log('len(iter_trees())', *[sum(1 for x in r.iter_trees()) for r in repos])


if __name__ == '__main__':
    main()

Result:

size() : 3839 8268978
len(sha_iter()) : 3839 8268978
len(iter_trees()) : 568851 568851

So:

Git thinks there are 6,990,030 objects in the database.
GitDB thinks there are 8,268,978.
GitCmdObjectDB thinks there are 3,839.

Git thinks there are 4,178,263 trees in the database.
Both GitDB and GitCmdObjectDB think there are 568,851.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions