A simple program to sort all the items in a directory, be they file or folder, by descending size.
Sorting items in a directory by descending size is not as straightforward as you might think, whether you are using a graphical file browser or the command line, because operating systems do not calculate the total size of a directory's contents when browsing a directory tree. This article offers complete working programs to overcome this on most operating systems.
Perhaps you will find the following familiar:
Whether for work or for personal projects, I like to organize my digital assets by creating a parent directory, let's say one called Projects, and storing all the content for the individual projects in there. If a project is small and doesn't involve a lot of content, I'll use a single file, usually a text file. If a project involves more content, say a text file as well as a couple of screenshots, I'll create a folder for that project and place all the related assets in there. So, from my perspective, the single text file and the folder are equivalent in the sense that each represents a project. The only difference is that the folder represents a bigger project, one with more stuff.
Sometimes I want to see which of my projects is currently the largest, which has the most stuff. This usually happens because I haven't worked on a particular area for some time, so when I come back to it, I want to see which project has the most content. My reasoning being that the project with the most content should be the most complete, and therefore probably the one I should start working on first, as it will be easiest to finish.
For example, consider a directory with the following contents:
Name | Type | Size |
---|---|---|
Huge Project.txt | File | 2.6KB |
Larger Project | Folder | 1.07KB |
0 - Tiny Project | Folder | 0KB |
Basic Project.txt | File | 0.36KB |
Big Project.txt | File | 2.11KB |
Sorting the above directory by descending size should output:
Huge Project.txt 2.6KB Big Project.txt 2.11KB Larger Project 1.07KB Basic Project.txt 0.36KB 0 - Tiny Project 0KB
However, this is not what we get when we click the Size column header in graphical file browsers on Windows, Mac, and Linux.
Using the command line provides output that is somewhat closer to the desired one, but still not entirely correct:
dir /b /o:-d
Output:
Larger Project 0 - Tiny Project Huge Project.txt Big Project.txt Basic Project.txt
There are various command combinations for directory content sorting on UNIX-based systems such as Mac and Linux. Most involve using du, sort, and ls. Other examples I found online threw find and grep into the mix as well.
Here are the ones I tried:
du -a -h --max-depth=1 | sort -hr
Output:
32K . 8.0K ./Larger Project 8.0K ./0 - Tiny Project 4.0K ./Huge Project.txt 4.0K ./Big Project.txt 4.0K ./Basic Project.txt
Using the -S switch on the ls command is supposed to do exactly what I'm looking for, sort items by descending size.
ls -S
Output:
'0 - Tiny Project' 'Larger Project' 'Huge Project.txt' 'A - Big Project.txt' 'Basic Project.txt'
The output is still off. I tried adding the -l (long) switch.
ls -lS
Output:
total 20 drwx---r-x 2 admin admin 4096 Sep 20 21:49 '0 - Tiny Project' drwx---r-x 2 admin admin 4096 Sep 20 21:49 'Larger Project' -rw-rw-r-- 1 admin admin 2667 Sep 20 21:49 'Huge Project.txt' -rw-rw-r-- 1 admin admin 2164 Sep 20 21:49 'Big Project.txt' -rw-rw-r-- 1 admin admin 368 Sep 20 21:49 'Basic Project.txt'
The output includes more detail, as expected, but the sort order is the same as before.
While the output of the various commands does not provide the desired output, it does highlight the root cause of the problem. When browsing a directory tree, operating systems do not recurse into folders to calculate the total size of their contents. Instead, they treat all folders as having the same fixed size. Usually this is the file system's minimum block size, commonly 4096 bytes, 4KB.
There must be at least a dozen free tools out there that solve this problem, but to be honest, I didn't even look. Writing a script/program that does the same thing and then sharing it here felt like it would be easier, involve less bloat, hopefully useful for others, and definitely more fun.
I've waffled on long enough. Here is the code:
Huge Project.txt 2.6KB Big Project.txt 2.11KB Larger Project 1.07KB Basic Project.txt 0.36KB 0 - Tiny Project 0KB
Larger Project 0 - Tiny Project Huge Project.txt Big Project.txt Basic Project.txt
32K . 8.0K ./Larger Project 8.0K ./0 - Tiny Project 4.0K ./Huge Project.txt 4.0K ./Big Project.txt 4.0K ./Basic Project.txt
'0 - Tiny Project' 'Larger Project' 'Huge Project.txt' 'A - Big Project.txt' 'Basic Project.txt'
There are some minor differences between the four implementations, but the general approach used for all four is the same:
On the command line, pass the path to the directory you want to sort as the first parameter. I won't list all the possible examples, but here are a couple, assuming you've copied the code and saved it as a file name dir_desc, short for "directory descending", plus the appropriate file extension:
Using Python on Mac or Linux:
python3 dir_desc.py
Using PowerShell on Windows:
powershell -f dir_desc.ps1
I ported my original approach in Python to a few other languages, so that there is at least one version that should work on each of the three major operating systems:
And that's it. Another yak, shaved. I hope you found this useful.
The above is the detailed content of Sort Items in a Directory by Descending Size Using Python, PowerShell, C#, or Go. For more information, please follow other related articles on the PHP Chinese website!