4
\$\begingroup\$

I've been learning python for some time and I've come up with the idea to practice it a little. I created a script that removes files by their extension .html and .srt and also empty directories. It works good, but I've got a feeling that it could be improved. Can someone help me improve the code? Also I know it is not a good practice to use for-loop inside another for-loop. Files structure:

super secret folder
    |___empty folder
    |___folder1
        |___file01.jpg
        |___file02.srt
        |___file03.html
    |___folder2
        |___file01.jpg
        |___file02.srt
        |___file03.html
    |___folder3
        |___file01.jpg
        |___file02.srt
        |___file03.html
import os


mypath = "C:\\Users\\user\\super secret folder"


def extract_ext(file_path):
    split_tuple = os.path.splitext(file_path)

    return split_tuple[0], split_tuple[1]

def abs_path(my_file_path, file_name):
    return f"{my_file_path}\\{file_name}"

subfolders_list = os.listdir(mypath)

subfolders_paths = [os.path.join(mypath, path) for path in subfolders_list]

    
for folder_path in subfolders_paths:
    files_list = os.listdir(folder_path)
    
    if len(files_list) != 0:   
            
        for file_name in files_list:
            file_abs_path = abs_path(folder_path, file_name)
            _, file_extension = extract_ext(file_abs_path)
            
            
            if file_extension == ".html" or file_extension == ".srt":
                os.remove(file_abs_path)
    else:
        os.rmdir(folder_path)
\$\endgroup\$
2
  • \$\begingroup\$ I think I'll untag "Windows", because: good news, this script can easily be made portable. \$\endgroup\$
    – Reinderien
    Commented Jul 22, 2022 at 11:27
  • \$\begingroup\$ Python pathlib.Path overrides the division operator and adds OS specific separators: path1 / path2 = "path1Content\path2Content" on Windows and "path1Content/path2Content" on POSIX \$\endgroup\$
    – lukstru
    Commented Jul 22, 2022 at 12:21

2 Answers 2

3
\$\begingroup\$

Your code can be simplified through the usage of os.walk() and pathlib.Path.

from os import walk
from pathlib import Path


def cleanup(directory: Path, suffixes: set[str]) -> None:
    """Remove unwanted files with the given
    suffixes as well as empty directories.
    """
    
    for root, dirs, files in walk(directory):
        root = Path(root)

        for file in map(lambda filename: root / filename, files):
            if file.suffix in suffixes:
                file.unlink()

        if root.is_dir() and not dirs and not files:
            root.rmdir()


def main() -> None:

    cleanup(Path('folder'), {'.html', '.srt'})


if __name__ == '__main__':
    main()

This way it is more naturally readable, what the code does plus you have directory recursion. By using a separate function (cleanup()), you can also re-use the code with other directories or suffix sets.

Before

$ tree folder/
folder/
├── bar
│   ├── file01.jpg
│   ├── file02.html
│   └── file03.srt
├── foo
│   ├── file01.jpg
│   ├── file02.html
│   └── file03.srt
└── spamm

After

$ tree folder/
folder/
├── bar
│   └── file01.jpg
└── foo
    └── file01.jpg

2 directories, 2 files
\$\endgroup\$
2
\$\begingroup\$

For loops inside for loops are necessary for this use case, you can however decrease indention levels by extracting code blocks into methods:

for folder_path in subfolders_paths:    
    if not is_empty(folder_path): 
        remove_unwanted_files(folder_path)
    else:
        remove_path(folder_path)

Indention can further be decreased by extracting the conditional as well:

for folder_path in subfolders_paths:
    remove_empty_and_unwanted(folder_path)
\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.