Write `ls` with UNIX source code in C

This blog post is about how to use C (UNIX source code) to re-implement the command line ls utility for UNIX systems. I enjoy hacking and find Operating Systems interesting so I wanted to re-implement ls without using #include <sys/stat.h>.

sys/stat.h is overpowered in my opinion, and takes the fun out of using ls within C source code. I wanted to develop a version of ls from first principles to absorb OS concepts like file i-node numbers. I also wanted to provide a simple solution rather than convoluted examples which are hard to understand.

Overview

The command we want to use should have the syntax:

$ ./ls .

. on the command line is a hidden file that is present in UNIX filesystems that represents the directory path of the current folder. In this example, we will be listing all the files within a folder, including the meta files . and .. for simplicity. .. in UNIX represents the file path of the parent directory.

Enable argument passing

The first thing we need to do is ensure our main function only accepts a single filepath argument.

#include <stdio.h>

int main(int argc, char *argv[]) {

  if (argc != 2) {
    printf("Please enter 1 argument for the directory to list.\n");
    return -1;
  }
}

The above code checks the argument count (argc) equals 2 and exits the program with a -1 return code if not.

You can see that to enable printf we have #include <stdio.h>.

Open the directory

We will be using #include <dirent.h> for accessing the dirent type, opendir and readdir commands. For accessing the files in the directory, we need to open the directory using opendir and specify the folder path as a string.

#include <dirent.h>
#include <stdio.h>

int main(int argc, char *argv[]) {

  if (argc != 2) {
    printf("Please enter 1 argument for the directory to list.\n");
    return -1;
  }
  
  DIR *directory = opendir(argv[1]);

  if (!directory) {
    printf("The directory doesn't exist or is unreadable.\n");
    return -1;
  }
}

We have also added error handling to ensure the directory was successfully opened. !directory is a really simple NULL check. NULL is equivalent to 0 in C, and 0 resolves to the BOOL value NO. if (!directory) thus means if directory is not NULL.

Read the directory recursively

#include <dirent.h>
#include <stdio.h>

int main(int argc, char *argv[]) {

  if (argc != 2) {
    printf("Please enter 1 argument for the directory to list.\n");
    return -1;
  }
  
  DIR *directory = opendir(argv[1]);

  if (!directory) {
    printf("The directory doesn't exist or is unreadable.\n");
    return -1;
  }
  
  printf("Printing out directory contents of %s:\n\n", argv[1]);
  
  struct dirent *direntry = readdir(directory);

  while (direntry != NULL) {
    printf("Filename: %s\ninode number: %llu\n\n", direntry -> d_name, direntry -> d_ino);
    direntry = readdir(directory);
  }
}

We now complete the implementation with recursive readdir. direntry is only guaranteed to have 2 members as part of the POSIX standard (Source): d_name and d_ino. These correspond to the file name and file inode number respectively.

EXTRA: What is an inode number?

inodes are pointers to disk memory where the files are actually stored.

Files are stored as byte blocks e.g. 1024 bytes, addressed using data block pointers. The inode stores the memory addresses of these data blocks for fast access in non-hierarchical file systems.

The inode numbers have different formats depending on the filesystem. They could include 6 direct pointers, 1 singly indirect block and 1 doubly indirect block, for example. These chained index blocks are essentially triply or doubly linked lists and exponentially increase the maximum file size for an Operating System. Small files can be addressed using direct pointers only.

The advantages for inodes are faster look-up times for files due to smaller search radii, caching, and close proximity to data. We can store the inodes close to the files on disk and this lookup is further sped up in custom-built SoCs.

TL;DR: We store the addresses of the file blocks in inodes as memory addresses (also known as pointers).

Testing our work

gcc readdir.c -o ls
./ls .

Fingers crossed...

Printing out directory contents of .:

Filename: .
inode number: 1921047

Filename: ..
inode number: 1915157

Filename: readdir.c~
inode number: 14415621

Filename: .DS_Store
inode number: 9054655

Filename: ls.dSYM
inode number: 14476348

Filename: readdir.c
inode number: 14429305

Filename: ls
inode number: 14487301

Adding additional file information

I've adapted the code for getting additional file information from the man pages for stat. Resave the file as readdir_stat.c to get the new stat features:

#include <dirent.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <time.h>

#define DEST_SIZE 100

int print_stat(const char *directory_path, char *entry_name) {
  struct stat sb;
  char filepath[DEST_SIZE];
  strcpy(filepath,directory_path);
  strcat(filepath,(const char *) "/");
  strcat(filepath,entry_name);
  strcat(filepath,(const char *) "\0");

  if (stat(filepath, &sb) == -1) {
    printf("stat failed");
    return -1;
  }

  printf("File type: ");

  switch (sb.st_mode & S_IFMT) {
    case S_IFBLK:  printf("block device\n");            break;
    case S_IFCHR:  printf("character device\n");        break;
    case S_IFDIR:  printf("directory\n");               break;
    case S_IFIFO:  printf("FIFO/pipe\n");               break;
    case S_IFLNK:  printf("symlink\n");                 break;
    case S_IFREG:  printf("regular file\n");            break;
    case S_IFSOCK: printf("socket\n");                  break;
    default:       printf("unknown?\n");                break;
  }

  printf("Mode: %lo (octal)\n", (unsigned long) sb.st_mode);
  printf("Link count: %ld\n", (long) sb.st_nlink);
  printf("Ownership: UID=%ld   GID=%ld\n", (long) sb.st_uid,
	 (long) sb.st_gid);
  printf("Preferred I/O block size: %ld bytes\n", (long) sb.st_blksize);
  printf("File size: %lld bytes\n", (long long) sb.st_size);
  printf("Blocks allocated: %lld\n", (long long) sb.st_blocks);
  printf("Last status change: %s", ctime(&sb.st_ctime));
  printf("Last file access: %s", ctime(&sb.st_atime));
  printf("Last file modification: %s\n", ctime(&sb.st_mtime));

  return 0;
}

We can use the helper function in main after some tweaks for error handling. I like using helper functions because they make code clearer and easy to partition.

int main(int argc, char *argv[]) {

  if (argc != 2) {
    printf("Please enter 1 argument for the directory to list.\n");
    return -1;
  }
  
  DIR *directory = opendir(argv[1]);

  if (!directory) {
    printf("The directory doesn't exist or is unreadable.\n");
    return -1;
  }
  
  printf("Printing out directory contents of %s:\n\n", argv[1]);
  
  struct dirent *direntry = readdir(directory);

  struct stat sb;
  
  while (direntry != NULL) {
    printf("Filename: %s\ninode number: %llu\n", direntry -> d_name, direntry -> d_ino);

    int code = print_stat(argv[1], direntry -> d_name);

    if (code != 0) {
      return code;
    }
    
    direntry = readdir(directory);
  }

  return 0;
}

Testing our work v2

Note here how I have used the -g flag to get debugging symbols. We can debug the program with LLDB if we want to using this flag. I used this to fix the dreaded [1] 82115 segmentation fault ./ls_stat . error message. This is a SIGSEGV Signal i.e Segmentation Fault that happens when we access memory out of the bounds of our program.

$ gcc -g readdir_stat.c -o ls_stat

$ ./ls_stat .

# LLDB
$ lldb ./ls_stat .

Using our new ls_stat function:

Printing out directory contents of .:

Filename: .
inode number: 12944739708
File type: directory
Mode: 40755 (octal)
Link count: 13
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 416 bytes
Blocks allocated: 0
Last status change: Thu Aug 26 14:28:10 2021
Last file access: Thu Aug 26 14:37:34 2021
Last file modification: Thu Aug 26 14:28:07 2021

Filename: ..
inode number: 12966193218
File type: directory
Mode: 40755 (octal)
Link count: 9
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 288 bytes
Blocks allocated: 0
Last status change: Thu Aug 26 14:28:08 2021
Last file access: Thu Aug 26 14:28:07 2021
Last file modification: Thu Aug 26 14:28:07 2021

Filename: ls_stat
inode number: 13080755973
File type: regular file
Mode: 100755 (octal)
Link count: 1
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 50312 bytes
Blocks allocated: 104
Last status change: Thu Aug 26 14:28:10 2021
Last file access: Thu Aug 26 14:28:15 2021
Last file modification: Thu Aug 26 14:28:10 2021

Filename: ls_stat.dSYM
inode number: 13080755326
File type: directory
Mode: 40755 (octal)
Link count: 3
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 96 bytes
Blocks allocated: 0
Last status change: Thu Aug 26 14:22:42 2021
Last file access: Thu Aug 26 14:23:55 2021
Last file modification: Thu Aug 26 14:22:42 2021

Filename: readdir.c~
inode number: 13080351695
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 606 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 13:39:43 2021
Last file access: Tue Aug 24 14:03:13 2021
Last file modification: Sun Aug 22 18:49:25 2021

Filename: readdir_stat.c~
inode number: 13080749917
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 654 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 13:47:58 2021
Last file access: Thu Aug 26 13:44:50 2021
Last file modification: Thu Aug 26 13:44:46 2021

Filename: readdir.c
inode number: 13080746629
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 628 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 13:39:45 2021
Last file access: Thu Aug 26 13:39:46 2021
Last file modification: Thu Aug 26 13:39:43 2021

Filename: readdir_stat.c
inode number: 13080750376
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 2342 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 14:28:03 2021
Last file access: Thu Aug 26 14:28:05 2021
Last file modification: Thu Aug 26 14:28:03 2021

Filename: ls
inode number: 13080725018
File type: regular file
Mode: 100755 (octal)
Link count: 1
Ownership: UID=501   GID=20
Preferred I/O block size: 4096 bytes
File size: 50047 bytes
Blocks allocated: 104
Last status change: Thu Aug 26 11:48:25 2021
Last file access: Thu Aug 26 11:48:25 2021
Last file modification: Sun Aug 22 19:07:33 2021

Also, you can see we now have .dSYM files in our directory which are debugging symbols. This is because we used the -g flag earlier.

Looks like a lot more than we'd need in day to day coding, but we can now use this whenever we want to instead of ls. Feel free to customise this because you probably won't need anything too different to the pre-baked ls for most things (it's very well documented and designed).

I hope you enjoyed playing around with UNIX source code. I found the project fun because I learnt something useful outside of iOS development.

PS: I implemented ls using readdir_r (recursive ➰) and this also performed the same. I abandoned this code snippet because the iterative solution was simpler to reason about.