2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2020

10/14/2024: Cleanup Obsidian Attachments Directory


Goal

To trash attachments no longer used by Obsidian pages.

Introduction

I’ve been copying Obsidian pages which also copies attachments. Then removing the attachment references and adding different attachments. This leaves orphan images in my attachments directory.

The script below trashes any files in the attachments directory that are not referenced by a markdown page.

How I use the script.

I created a small bash script that does the following so I don’t need to re-type the command.

./cmd-find-unreferenced-attachments.py \
    -m /home/medined/Dropbox/david/fences \
    -a /home/medined/Dropbox/david/fences/Attachments

Script

This script is located at https://github.com/medined/obsidian-attachment-cleanup.

Scroll down to the main function below and the logic of the script should be easy to follow.

#!/bin/env python
# -*- coding: utf-8 -*-

import argparse
import logging
import os
import re
from send2trash import send2trash

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')


markdown_dir = '/home/medined/Dropbox/david/'
attachments_dir = '/home/medined/Dropbox/david/fences/Attachments'

def traverse_markdown_files(directory):
    """
    Traverse the given directory to find all markdown files.

    Args:
        directory (str):g The directory to traverse.

    Returns:
        list: A list of full paths to markdown files.
    """
    markdown_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.md'):
                markdown_files.append(os.path.join(root, file))
    return markdown_files

def extract_image_references(markdown_files):
    """
    Extract image references from the given markdown files.

    Args:
        markdown_files (list): A list of full paths to markdown files.

    Returns:
        set: A set of image references found in the markdown files.
    """
    image_references = set()
    image_pattern = re.compile(r'!\[\[([^\|\]]+)(?:\|\d+)?\]\]')
    for md_file in markdown_files:
        with open(md_file, 'r', encoding='utf-8') as file:
            content = file.read()
            matches = image_pattern.findall(content)
            for match in matches:
                image_references.add(match)
    return image_references

def list_all_attachments(directory):
    """
    List all attachment files in the given directory with full paths.

    Args:
        directory (str): The directory to list attachment files from.

    Returns:
        set: A set of full paths to attachment files.
    """
    return set(os.path.join(directory, f) for f in os.listdir(directory))

def find_unreferenced_files(all_attachments, image_references, attachments_dir):
    """
    Find unreferenced files by comparing all attachments with image references.

    Args:
        all_attachments (set): A set of full paths to all attachment files.
        image_references (set): A set of image references found in markdown files.
        attachments_dir (str): The directory where attachments are stored.

    Returns:
        set: A set of unreferenced attachment files.
    """
    return all_attachments - set(os.path.join(attachments_dir, ref) for ref in image_references)

def main():
    """
    Main function to find and move unreferenced attachment files to trash.
    """

    parser = argparse.ArgumentParser(description="Clean unreferenced attachment files.")
    parser.add_argument("-m", '--markdown_dir', required=True, help="Directory containing markdown files")
    parser.add_argument("-a", '--attachments_dir', required=True, help="Directory containing attachment files")
    args = parser.parse_args()
    
    markdown_dir = args.markdown_dir
    attachments_dir = args.attachments_dir

    markdown_files = traverse_markdown_files(markdown_dir)
    image_references = extract_image_references(markdown_files)
    all_attachments = list_all_attachments(attachments_dir)
    unreferenced_files = find_unreferenced_files(all_attachments, image_references, attachments_dir)

    for file in unreferenced_files:
        logging.info(f"Moving to trash: {file}")
        send2trash(file)
            

if __name__ == "__main__":
    main()

07/05/2024: Generating Placeholder Docstrings Using The ast Python Package


Goal

To generate placeholder docstrings in a legacy code base to reduce pyliny message clutter.

Introduction

In July 2024, Running PyLint on the OpenDevin codebase produced 713 messages. 234 of these messages were related to missing docstrings. It is possible, even easy, to configure PyLint to ignore these messages. However, doing so won’t nudge the code base towards better documentation. Once all placeholder docstrings are in place, future code (pull requests) can be rejected if docstrings are not present.

Additionally, having a large number of PyLint messages make it hard to find important messages. For example, this message might be important enough to fix since when check is True a CalledProcessError exception will be raised when the sub-process returns a non-zero exit code.

opendevin/runtime/docker/local_box.py:31:32: W1510: 
Using subprocess.run without explicitly set `check` is not recommended. (subprocess-run-check)

Aid To Conformance

If your goal is to have “real” docstrings, this tool can help. Add the placeholders, then search for “placeholder” and start to add real comments. This approach provides metrics about how many comments need to be added before the work is done.

Solution

The following python script will add module, class, and function placeholder docstrings everywhere they are missing.

NOTE: If an existing docstring contains a null byte (“\x00”) then the script below will convert that docstring into a single-quote string instead of a triple-quote string. This is an issue with how python and the aster package handles the null byte. Replace your “\0x00” with “\x00” to avoid this issue.

"""
This script adds placeholder module, class, and function docstrings.

It is intended to be used in legacy code bases to reduce the number of lint messages while still encouraging real docstrings to be added. Over time, the placeholder docstring can be replaced.
"""

import os
import ast
import astor
error_count = 0
errors = []


def find_classes_without_docstrings(tree):
    """
    Walks the abstract syntax tree looking for classes without a docstring.
    """
    classes_without_docstrings = []
    for node in ast.walk(tree):
        if isinstance(node, ast.ClassDef):
            if not node.body or not isinstance(node.body[0], ast.Expr
                ) or not isinstance(node.body[0].value, ast.Constant):
                classes_without_docstrings.append(node)
    return classes_without_docstrings


def find_functions_without_docstrings(tree):
    """
    Walks the abstract syntax tree looking for functions without a docstring.
    """
    functions_without_docstrings = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            if not node.body or not isinstance(node.body[0], ast.Expr
                ) or not isinstance(node.body[0].value, ast.Constant):
                functions_without_docstrings.append(node)
    return functions_without_docstrings


def docstring(s):
    """
    A helper function to wrap a string into a docstring node.
    """
    return ast.Expr(value=ast.Constant(s))


def add_placeholder_docstrings(file_path):
    """
    The function that gets the work done. It reads and parse the python file. Then
    finds where the docstrings are missing and adds them.
    """
    global error_count

    needs_writing = False
    try:
        with open(file_path, 'r', encoding='utf-8') as file:
            content = file.read()
            if not content.strip():
                with open(file_path, 'w', encoding='utf-8') as empty_file:
                    empty_file.write('"""\nPlaceholder Module Docstring\n"""\n'
                        )
                return True
            tree = ast.parse(content, filename=file_path)
        if not tree.body or not isinstance(tree.body[0], ast.Expr
            ) or not isinstance(tree.body[0].value, ast.Constant):
            tree.body.insert(0, docstring('Placeholder Module Docstring'))
            needs_writing = True
        for func in find_classes_without_docstrings(tree):
            func.body.insert(0, docstring('Placeholder Class Docstring'))
            needs_writing = True
        for func in find_functions_without_docstrings(tree):
            func.body.insert(0, docstring('Placeholder Function Docstring'))
            needs_writing = True
        if needs_writing:
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(astor.to_source(tree))
    except Exception as e:
        errors.append(f'Error processing {file_path}: {e}')
        print(f'Error processing {file_path}: {e}')
        error_count += 1
        pass
    return needs_writing


def check_directory_for_missing_docstrings(root_dir):
    """
    This is the directory walker. It looks for python files.
    """
    for subdir, _, files in os.walk(root_dir):
        for file in files:
            if file.endswith('.py'):
                file_path = os.path.join(subdir, file)
                if add_placeholder_docstrings(file_path):
                    print(f'Added placeholder docstrings in {file_path}')


check_directory_for_missing_docstrings('.')
print('---------------------------')
for error in errors:
    print(error)
print(f'Finished with {error_count} errors.')