Sunday, August 26, 2012

Traversing or walking directories with Scala

Traversing or walking through directories to find all files is probably a topic on which is already often written about. Nevertheless, I will add my on thoughts about it.
I have seen quite some implementations for this, even in Scala, but still I thought I could make it even simpler, and also support parallelism although I don't really know whether that really speeds up things, since you're probably traversing on a single disk and the disk head can only be at one place at the time.

Even though, here's my solution:
class GenFileList(val files:GenSeq[File]) {
  import FileList.file2FileList
  private def fileOrSubDir: (File) => GenSeq[File] = f => if (f.isFile) GenSeq(f) else f.walk

  def walk:GenSeq[File] = {

  def walkWithFilter(fileFilter: File => Boolean):GenSeq[File] = {

class ParFileList(override val files:ParSeq[File]) extends GenFileList(files)

class FileList(override val files:List[File]) extends GenFileList(files) {
  def par = new ParFileList(files.par)

object FileList {
  def apply(dir:File) = new FileList(dir.listFiles().toList)

  implicit def file2FileList(dir:File):FileList = FileList(dir)

Using it is very simple: it provides an implicit conversion of your into a FileList, so you can just do this:
import FileList.file2FileList
val dir = new File(".")

println("Traversing without a filter:")

println("Traverse using a filter:")
The 'walk' just returns a GenSeq[File] so you can do any collection operation on it to do something with the found files.

To traverse in parallel, just add 'par':
import FileList.file2FileList
val dir = new File(".")

println("Traversing parallel without a filter:")

println("Traversing parallel with a filter:")
Pretty neat if I may say so.