Disk Usage Reports

Rich web-based usage reports, to help you keep things under control.

Documentation


Requirements ↑Top

  • A web server.
  • PHP 4+ CLI (command-line interface).
    Note: It is not required for the web server to be configured to execute PHP scripts.

Quick Start ↑Top

Installation ↑Top

  1. Download the latest version of Disk Usage Reports.
  2. Unzip the files into your Web server's public directory.
    • Linux Example:
      • /var/www/html
        • diskusage
          • css
          • data
          • images
          • js
          • lang
          • scripts
    • Windows Example:
      • C:\Inetpub\wwwroot
        • diskusage
          • css
          • data
          • images
          • js
          • lang
          • scripts
  3. If your web server executes PHP scripts, you must either secure the 'scripts' directory so it is not publicly accessible, or move the 'scripts' directory to a location on your server that is not publicly accessible.

Overview of Generating Reports ↑Top

There are two steps to generating the reports:

  1. Create a list of directories and files that are in the directory on which you wish to report.
  2. Process the list to generate the report files.

Step 1: Creating the List of Directories and Files ↑Top

For Linux, Mac OS X, and BSD systems, the fastest way is to use the bash script scripts/find.sh, which uses the GNU find command. It has the following syntax:

Syntax: find.sh [-b|-ne] [-d <char|'null'>] [-] <directory-to-scan>
                [<find-test>, ...]

Arguments:

-b
Force the usage of the 'ls' command's -b argument to escape unusual characters
(e.g. a newline) in file names. Use this flag if you know that 'ls' supports
this argument on your system and you want to skip the use of 'mktemp' to check
for support.

-d <char|'null'>
Optionally specify the field delimiter for each line in the output.
Must be a single ASCII character or the word 'null' for the null character.
The default is the space character.

-ne
Force the script to execute even if the 'ls' command does not support the
--escape or -b arguments. This will cause problems if file names encountered
during the scan contain newlines.

- (minus sign)
If the <directory-to-scan> is the same as one of the options for this script
(e.g. '-d'), you must use a minus sign as an argument before it. You should
do this if you ever expect the <directory-to-scan> to start with a minus sign.

<directory-to-scan>
The directory that the list of sub-directories and files will be created for.

<find-test>
Optionally specify one or more tests that will be passed directly to the
'find' command. You must use the absolute path for any tests that match the
path, such as '-path'. Do not use any expressions that would change the output
of find, such as '-ls'. If using '-type', make sure that you do not exclude
directories. See the 'find' man page for details.

    Expression Examples:
        ! -name '.DS_Store' -a ! -name 'Thumbs.db'
        Exclude extra files created by Windows and Mac OS.

        ! -size 0c
        Exclude files that have a size of zero bytes.

        ! -path '/var/www/html/somesite/*'
        Exclude the contents of a directory from the results.

        ! -path '/var/www/html/somesite' -a ! -path '/var/www/html/somesite/*'
        Completely exclude a directory from the results.

        -type d -a -type f
        Only include directories and regular files.

For Windows systems, the fastest way is to use the scripts/find.exe command. It has the following syntax:

Syntax: find.exe [OPTIONS] <directory-to-scan>

<directory-to-scan>
The directory that the list of sub-directories and files will be created for.

The OPTIONS are:

      -d <delim>
      The field delimiter for each line in the output.
      The default is the NULL character.

      -ds <directoryseparator>
      The directory separator used between directory names.
      The default is the directory separator for the operating system.

For other systems, you may use scripts/find.php. It has the following syntax:

Syntax: php find.php [-d <char|'null'>] [-ds <char>] [--force32bit]
                     [-] <directory-to-scan>

Arguments:

-d <char|'null'>
Optionally specify the field delimiter for each line in the output.
Must be a single ASCII character or the word 'null' for the null character.
The default is the space character.

-ds <directoryseparator>
Optionally specify the directory separator used between directory names.
The default is the directory separator for the operating system.

--force32bit
Force the script to execute on 32-bit versions of PHP.
This may lead to incorrect totals if find.php encounters files over 2 GB.

- (hyphen)
If the <directory-to-scan> is the same as one of the arguments for this script
(e.g. '-d'), you must use a minus sign as an argument before it. You should
do this if you ever expect the <directory-to-scan> to start with a minus sign.

<directory-to-scan>
The directory that the list of sub-directories and files will be created for.

All the above scripts will output to STDOUT.

Here are some examples of their usage:

bash scripts/find.sh path/to/directory > list-of-files.dat

scripts\find.exe path\to\directory > list-of-files.dat

php scripts/find.php path/to/directory > list-of-files.dat

Here are some examples of the OPTIONS:

bash scripts/find.sh -d " " path/do/directory > list-of-files.dat

Use a space as the field delimiter in the output. This is useful if you want to visually inspect the output, since the default NULL delimiter does not display on the console.

scripts\find.exe -ds / path\to\directory > list-of-files.dat

If the files are on a Windows server but you will be processing the report on a Linux computer, you must force the script to use a forward slash as a directory separator using -ds.

Step 2: Processing the List and Generating the Report ↑Top

Processing the output of the "find" scripts is done by the PHP script scripts/process.php. It has the following syntax:

Syntax: php process.php [OPTIONS] <report-directory> [<filelist>]

<report-directory>
The directory where the report files will be saved. This should point to a
directory under the 'data' directory.
    Examples:
        /var/www/html/diskusage/data/myreport
        C:\Inetpub\wwwroot\diskusage\data\myreport

<filelist>
The file that was created using one of the 'find' scripts (e.g. find.php).
If you ommit this, process.php will attempt to read the file list from STDIN.

The OPTIONS are:

      - (hyphen)
      If the <report-directory> or <filelist> are the same as one of the
      OPTIONS for this script (e.g. "-d"), you must use a minus sign as an
      argument before it. You should do this if you ever expect the
      <directory-to-scan> to start with a minus sign.

      -d <delim>
      The field delimiter that each line of the filelist will be split using.
      The default is the NULL character. Will be ignored if <filelist> has a
      header line (see notes).

      -ds <directoryseparator>
      Specify the directory separator used in the file list. This is useful
      if the list from step 1 was generated on a different operating system
      which uses a different directory separator. For example, Windows uses
      a backslash (\) while Linux/BSD/Mac/etc systems use a forward slash (/).
      The default is the directory separator for the operating system
      processing the report.  Will be ignored if <filelist> has a header
      line (see notes).

      -fp
      Display the full path of the directories in the report. This is off by
      default since it could potentially pose a security risk.

      -l <num>
      Lines in the report that are longer than <num> will not be processed.
      This is just a failsafe to prevent the script from processing a list
      file that is not formatted properly. The default is 1024.

      -mt <bytes>
      The maximum number of bytes that the 'directory tree' file can be.
      The default is 819200. If the 'directory tree' file gets larger than
      this number, then the script will act as if -nt had been specified.

      -n <reportname>
      This text will display in the header of the report.

      -nt
      Disable the directory tree that appears on the left side of the report.

      -q
      Do not output any text to STDOUT. The script will return a non-zero
      if it fails.

      -ss <seconds>
      The minimum number of seconds that must elapse before another status
      message (e.g. 'Read X bytes, processed X lines...') is outputted.
      Default is 15 seconds.

      -su <suffix>
      Set the suffix of report files. This is '.txt' by default. You must
      also edit the 'suffix' variable in index.html to include any suffix
      besides the default or an empty suffix.

      -t <depth>
      Limit the "File Sizes", "Modified", and "File Types" totals to only
      <depth> directories deep in the report. This is useful if the directory
      being reported on has many files, which can cause the report to take a
      long time to generate. For example, if this is set to 3 the directory
      ./a, ./a/b and ./a/b/c will have these totals available, but ./a/b/c/d
      will not. The default is 6.

      -td <depth>
      Similar to -t but instead limits the "Top 100" list to only <depth>
      directories deep in the report. This is useful if the directory being
      reported on has many files, which can cause the report to take a long
      time to generate. The default is 3.

      -tz <timezone>
      Set the report timezone. These are the same timezones as
      http://php.net/manual/en/timezones.php. The default is the system's
      timezone (if it can be determined).

      -v
      Output additional information as the script executes.

      -vv
      Output more information than -v.

Notes:

      o You should set the -tz option as trying to determine the system's
        timezone is unreliable.

      o You may execute process.php on a separate server than the 'find'
        script if you are worried about it using CPU time.

      o The directory separator used in <filelist> must be a forward slash
        if this script is executed on a *nix system.
        
      o If the <filelist> has a header line (starts with a #) then the -d
        and -ds OPTIONS will be ignored since the header explicitly
        defines what their values should be.

Here are some examples of its usage:

php scripts/process.php path/to/report/dir list-of-files.dat

cat list-of-files.dat | php scripts/process.php /var/www/html/usage/data/myreport

php scripts\process.php c:\path\to\report\dir list-of-files.dat

Here are some examples of combining the 'find' scripts and process.php into one command:

bash scripts/find.sh path/to/directory | php scripts/process.php path/to/report/dir

scripts\find.exe c:\path\to\directory | php scripts\process.php c:\Inetpub\wwwroot\diskusage\data\myreport

php scripts/find.php path/to/directory | php scripts/process.php /var/www/html/diskusage/data/myreport

Here are some examples of the OPTIONS:

php scripts/process.php -n "My Report" path/to/report/dir list-of-files.dat

Show the name of the report in the header. It will appear as "Disk Usage Report for: My Report".

php scripts/process.php -t 1 -td 1 diskusageinstall/data/myreport

Only show the "Files Sizes", "Modified" and "File Types" totals for the root directory. This can speed up the report generation if there are a lot of sub directories. This also cuts back on the total size of the report.

php scripts\process.php -tz "America/New_York" c:\path\to\report\dir list-of-files.dat

Set the timezone to EST.

php scripts/process.php -d ":" -ds "/" -fp -l 1024 -mt 819200 -n "My Report" -nt -ss 15 -su ".txt" -t 6 -td 6 -tz "America/New_York" - diskusageinstall/data/myreport - list-of-files.dat

An example of all OPTIONS in use.

Where to Save Reports ↑Top

By default, Disk Usage Reports will look for all reports within the data directory of your installation. You can change this by editing the reportsBaseURL variable in index.html.

Let's assume you installed Disk Usage Reports at /var/www/html/reports (or C:\Inetpub\wwwroot\reports for Windows).

You would want to save your reports as directories within /var/www/html/reports/data (or C:\Inetpub\wwwroot\reports\data for Windows).

Here are some examples of process.php with this in mind:

php scripts/process.php /var/www/html/reports/data/myreport list-of-files.dat

php scripts\process.php C:\Inetpub\wwwroot\reports\data\myreport list-of-files.dat

Viewing Reports ↑Top

URLs for viewing reports are in the following format:
http://hostname.com/path/to/diskusage/?reportpath

By default, the reportpath is relative to http://mysite.com/path/to/diskusage/data/. You can change this by editing the reportsBaseURL variable in index.html.

Let's continue the example in Where to Save Reports where we saved out report to /var/www/html/reports/data/myreport (or C:\Inetpub\wwwroot\reports\data\myreport on Windows).

Our URL for the report would be:
http://mysite.com/reports/?myreport

Which will load the report from:
http://mysite.com/reports/data/myreport/

Organizing Your Reports ↑Top

You can organize your reports into subdirectories as necessary.

For example, let's assume a report was created at /var/www/html/reports/data/edu/mit (or C:\Inetpub\wwwroot\reports\data\edu\mit on Windows). This report is for MIT and is organized into a directory called "edu".

You would view that report by browsing to http://mysite.com/reports/?edu/mit

You can take this a step further by creating a historical archive of reports by organizing them by date.

For example, let's assume the report was created at /var/www/html/reports/data/edu/mit/2011-06 (or C:\inetpub\wwwroot\reports\data\edu\mit\2011-06 on Windows).

You would view that report by browsing to http://mysite.com/reports/?edu/mit/2011-06

Securing Reports ↑Top

It is possible that a person could guess the path to a report. For example, you could guess that the report for the mathematics department is at http://mysite.com/reports/?math

An easy way to avoid this issue is by including extra characters in the report directory that act as a password.

For example, by naming the report directory math_diw9481 (which would be viewed at http://mysite.com/reports?math_diw9481) you make it very difficult to guess the Web address for a report.

Optional Report Settings ↑Top

There are several optional settings that can be edited in index.html. A description is included for each setting.

About Version Numbers: ↑Top

As of 1.0.0 the version numbers will now follow the Semantic Versioning guidelines at semver.org as closely as possible.

Releases will be numbered with the following format: <major>.<minor>.<patch>

The following are some of the rules that will be followed:

  • Breaking backwards compatibility will increase the <major>.
  • New additions that do not break backwards compatibility will increase the <minor>.
  • Bug fixes and minor changes will increase the <patch>.