git filter branch - 将许多子目录分离到一个新的单独的Git存储库中

original title: "git filter branch - Detach many subdirectories into a new, separate Git repository"


Translate

This question is based on Detach subdirectory into separate Git repository

Instead of detaching a single subdirectory, I want to detach a couple. For example, my current directory tree looks like this:

/apps
  /AAA
  /BBB
  /CCC
/libs
  /XXX
  /YYY
  /ZZZ

And I would like this instead:

/apps
  /AAA
/libs
  /XXX

The --subdirectory-filter argument to git filter-branch won't work because it gets rid of everything except for the given directory the first time it's run. I thought using the --index-filter argument for all unwanted files would work (albeit tedious), but if I try running it more than once, I get the following message:

Cannot create a new backup.
A previous backup already exists in refs/original/
Force overwriting the backup with -f

Any ideas? TIA



这个问题是基于将子目录分离到单独的Git存储库中,而不是分离单个子目录,我想分离几个。例如,我当前的目录树看起来很像...

这是翻译后的摘要,如果您需要查看完整的翻译,请单击“Translate”图标


所有的回答
  • Translate

    Instead of having to deal with a subshell and using ext glob (as kynan suggested), try this much simpler approach:

    git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- apps/AAA libs/XXX' --prune-empty -- --all
    

    As mentioned by void.pointer in his/her comment, this will remove everything except apps/AAA and libs/XXX from current repository.


  • Translate

    Manual steps with simple git commands

    The plan is to split individual directories into its own repos, then merge them together. The following manual steps did not employ geek-to-use scripts but easy-to-understand commands and could help merge extra N sub-folders into another single repository.

    Divide

    Let's assume your original repo is: original_repo

    1 - Split apps:

    git clone original_repo apps-repo
    cd apps-repo
    git filter-branch --prune-empty --subdirectory-filter apps master
    

    2 - Split libs

    git clone original_repo libs-repo
    cd libs-repo
    git filter-branch --prune-empty --subdirectory-filter libs master
    

    Continue if you have more than 2 folders. Now you shall have two new and temporary git repository.

    Conquer by Merging apps and libs

    3 - Prepare the brand new repo:

    mkdir my-desired-repo
    cd my-desired-repo
    git init
    

    And you will need to make at least one commit. If the following three lines should be skipped, your first repo will appear immediate under your repo's root:

    touch a_file_and_make_a_commit # see user's feedback
    git add a_file_and_make_a_commit
    git commit -am "at least one commit is needed for it to work"
    

    With the temp file commited, merge command in later section will stop as expected.

    Taking from user's feedback, instead of adding a random file like a_file_and_make_a_commit, you can choose to add a .gitignore, or README.md etc.

    4 - Merge apps repo first:

    git remote add apps-repo ../apps-repo
    git fetch apps-repo
    git merge -s ours --no-commit apps-repo/master # see below note.
    git read-tree --prefix=apps -u apps-repo/master
    git commit -m "import apps"
    

    Now you should see apps directory inside your new repository. git log should show all relevant historical commit messages.

    Note: as Chris noted below in the comments, for newer version(>=2.9) of git, you need to specify --allow-unrelated-histories with git merge

    5 - Merge libs repo next in the same way:

    git remote add libs-repo ../libs-repo
    git fetch libs-repo
    git merge -s ours --no-commit libs-repo/master # see above note.
    git read-tree --prefix=libs -u libs-repo/master
    git commit -m "import libs"
    

    Continue if you have more than 2 repos to merge.

    Reference: Merge a subdirectory of another repository with git


  • Translate

    Why would you want to run filter-branch more than once? You can do it all in one sweep, so no need to force it (note that you need extglob enabled in your shell for this to work):

    git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch $(ls -xd apps/!(AAA) libs/!(XXX))" --prune-empty -- --all
    

    This should get rid of all the changes in the unwanted subdirectories and keep all your branches and commits (unless they only affect files in the pruned subdirectories, by virtue of --prune-empty) - no issue with duplicate commits etc.

    After this operation the unwanted directories will be listed as untracked by git status.

    The $(ls ...) is necessary s.t. the extglob is evaluated by your shell instead of the index filter, which uses the sh builtin eval (where extglob is not available). See How do I enable shell options in git? for further details on that.


  • Translate

    Answering my own question here... after a lot of trial and error.

    I managed to do this using a combination of git subtree and git-stitch-repo. These instructions are based on:

    First, I pulled out the directories I wanted to keep into their own separate repository:

    cd origRepo
    git subtree split -P apps/AAA -b aaa
    git subtree split -P libs/XXX -b xxx
    
    cd ..
    mkdir aaaRepo
    cd aaaRepo
    git init
    git fetch ../origRepo aaa
    git checkout -b master FETCH_HEAD
    
    cd ..
    mkdir xxxRepo
    cd xxxRepo
    git init
    git fetch ../origRepo xxx
    git checkout -b master FETCH_HEAD
    

    I then created a new empty repository, and imported/stitched the last two into it:

    cd ..
    mkdir newRepo
    cd newRepo
    git init
    git-stitch-repo ../aaaRepo:apps/AAA ../xxxRepo:libs/XXX | git fast-import
    

    This creates two branches, master-A and master-B, each holding the content of one of the stitched repos. To combine them and clean up:

    git checkout master-A
    git pull . master-B
    git checkout master
    git branch -d master-A 
    git branch -d master-B
    

    Now I'm not quite sure how/when this happens, but after the first checkout and the pull, the code magically merges into the master branch (any insight on what's going on here is appreciated!)

    Everything seems to have worked as expected, except that if I look through the newRepo commit history, there are duplicates when the changeset affected both apps/AAA and libs/XXX. If there is a way to remove duplicates, then it would be perfect.


  • Translate

    I have writen a git filter to solve exactly this problem. It has the fantastic name of git_filter and is located at github here:

    https://github.com/slobobaby/git_filter

    It is based on the excellent libgit2.

    I needed to split a large repository with many commits (~100000) and the solutions based on git filter-branch took several days to run. git_filter takes a minute to do the same thing.


  • Translate

    Use 'git splits' git extension

    git splits is a bash script that is a wrapper around git branch-filter that I created as a git extension, based on jkeating's solution.

    It was made exactly for this situation. For your error, try using the git splits -f option to force removal of the backup. Because git splits operates on a new branch, it won't rewrite your current branch, so the backup is extraneous. See the readme for more detail and be sure to use it on a copy/clone of your repo ( just in case!).

    1. install git splits.
    2. Split the directories into a local branch #change into your repo's directory cd /path/to/repo #checkout the branch git checkout XYZ
      #split multiple directories into new branch XYZ git splits -b XYZ apps/AAA libs/ZZZ

    3. Create an empty repo somewhere. We'll assume we've created an empty repo called xyz on GitHub that has path : git@github.com:simpliwp/xyz.git

    4. Push to the new repo. #add a new remote origin for the empty repo so we can push to the empty repo on GitHub git remote add origin_xyz git@github.com:simpliwp/xyz.git #push the branch to the empty repo's master branch git push origin_xyz XYZ:master

    5. Clone the newly created remote repo into a new local directory
      #change current directory out of the old repo cd /path/to/where/you/want/the/new/local/repo #clone the remote repo you just pushed to git clone git@github.com:simpliwp/xyz.git


  • Translate

    Yeah. Force overwriting the backup by using the -f flag on subsequent calls to filter-branch to override that warning. :) Otherwise I think you have the solution (that is, eradicate an unwanted directory at a time with filter-branch).


  • Translate
    git clone git@example.com:thing.git
    cd thing
    git fetch
    for originBranch in `git branch -r | grep -v master`; do
        branch=${originBranch:7:${#originBranch}}
        git checkout $branch
    done
    git checkout master
    
    git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- dir1 dir2 .gitignore' --prune-empty -- --all
    
    git remote set-url origin git@example.com:newthing.git
    git push --all
    

  • Translate

    Delete the backup present under the .git directory in refs/original like the message suggests. The directory is hidden.