Often while coding up unit tests in Scala, I need to read from a file which is available in the resources folder. There are a few variations to how this can be done, specifically if I am using the contents of the file as DataFrame in Spark. Here are some examples what I want to keep for myself as notes.
All these examples work with JDK 1.8u144, Scala 2.11.8 and Spark 2.3.1. Test them out if your version of software differ substantially from these versions.
- Get the contents of the text file as Iterator:
- Get a text file as Spark RDD[String], individual lines as rows:
- Get a text file as a [n x 1] Spark DataFrame with individual lines as rows:
- Read a line-delimited JSON file into a Spark DataFrame:
- The final one is a niche use case where we have a bunch of events in an avro file, and would like to read the events. In this case, we use an iterator style, i.e. we stream the file lazily. For more notes on this, see here.