yet another blog

Tue, 21 Mar 2006

A program to make some statistics on a filesystem

I wanted to know how files are distributed in my home directory, so I wrote a quick OCaml program to do this analysis. This program gives following result on my home directory (it counts links as files):

$ ./filesystem-analysis.ml ~
Total file size:  6120592065 bytes
Minimum non empty file size:    1.00 B 
Maximum file size:  163.77 MB
Average file size:   28.81 KB
Total number of files: 207476
 of which are empty files: 544
Total number of directories: 22932
File size distribution:
[   1.00 B  -    1.00 B ]     592 files - total average of  592.00 B 
[   2.00 B  -    3.00 B ]     574 files - total average of    1.68 KB
[   4.00 B  -    7.00 B ]     772 files - total average of    4.52 KB
[   8.00 B  -   15.00 B ]    2205 files - total average of   25.84 KB
[  16.00 B  -   31.00 B ]    4077 files - total average of   95.55 KB
[  32.00 B  -   63.00 B ]   15371 files - total average of  720.52 KB
[  64.00 B  -  127.00 B ]    7598 files - total average of  712.31 KB
[ 128.00 B  -  255.00 B ]    6258 files - total average of    1.15 MB
[ 256.00 B  -  511.00 B ]   10922 files - total average of    4.00 MB
[ 512.00 B  - 1023.00 B ]   21493 files - total average of   15.74 MB
[   1.00 KB -    2.00 KB]   24063 files - total average of   35.25 MB
[   2.00 KB -    4.00 KB]   35981 files - total average of  105.41 MB
[   4.00 KB -    8.00 KB]   27669 files - total average of  162.12 MB
[   8.00 KB -   16.00 KB]   16737 files - total average of  196.14 MB
[  16.00 KB -   32.00 KB]   16654 files - total average of  390.33 MB
[  32.00 KB -   64.00 KB]    7057 files - total average of  330.80 MB
[  64.00 KB -  128.00 KB]    4190 files - total average of  392.81 MB
[ 128.00 KB -  256.00 KB]    2905 files - total average of  544.69 MB
[ 256.00 KB -  512.00 KB]    1036 files - total average of  388.50 MB
[ 512.00 KB - 1024.00 KB]     738 files - total average of  553.50 MB
[   1.00 MB -    2.00 MB]     301 files - total average of  451.50 MB
[   2.00 MB -    4.00 MB]     115 files - total average of  345.00 MB
[   4.00 MB -    8.00 MB]     104 files - total average of  624.00 MB
[   8.00 MB -   16.00 MB]      38 files - total average of  456.00 MB
[  16.00 MB -   32.00 MB]      14 files - total average of  336.00 MB
[  32.00 MB -   64.00 MB]       8 files - total average of  384.00 MB
[  64.00 MB -  128.00 MB]       1 files - total average of   96.00 MB
[ 128.00 MB -  256.00 MB]       3 files - total average of  576.00 MB

Returned results are pretty expected. Over the 6GB of files in my home directory, I have about 200,000 files of an average size of 30KB and 23,000 directories (about 10% of the number of files). The vast majority of files are below 32KB but a small number of big files are taking the vast majority of my disk space.

A more interesting point is that a non negligible number of files (10%) is smaller than 64 bytes. In the case of the distributed backup system, it would be time and space saving to save them within the master index.

2006-03-21T20:07:50Z [] permanent link

Made with PyBlosxom