Quantcast
Channel: Pensieve
Viewing all articles
Browse latest Browse all 35

Java Lambda Recipes 1 – Read Data From File Into Pojo

$
0
0

Today I was refactoring this piece of code which had way too many IF..Else statements. We get data from an upstream team as files in S3. Each file contains the details of a customer. Unfortunately though, the details had not been encoded in any standard format like XML, Json, etc. The structure of these files are:

  • Each line contains one attribute only.
  • The attribute key and value are separated by “:”.
  • There may or may not be extra blank spaces.
  • The file can have blank or wrongly formatted lines.
  • The file can have attributes that we are not interested in.

There was this function that accepted the location of the file and returned Person object:

  • Read lines from the file.
  • Using a series of IF blocks to check which attribute is present on a line and parse the value accordingly.

Following is what I ended up after refactoring the code to use lambdas. You can find the complete code at GitHub.

First the necessary classes:

Now the interesting lambda stuff. Our method takes a list of lines as input and returns the Person object as output. Let’s first create a stream of the input lines and extract the attributes as a map. We’ll do the following operations (in order) on each line:

  1. Split the line at “:”.
  2. Filter out all the lines that do not have 2 elements after the split. Remember all our attributes are present as “key : value”.
  3. Filter out all the lines whose key is not present in the Attribute enum as we are only interested in a few selected attributes.
  4. Now that we only have those pairs that we are interested in we’ll collect the result into a Map. Pay special attention to the collect() operation. It took a while to get it right.

There’s only one more function that needs to be explained – getCountry(). The complication with Country is that the COUNTRY key might not always be present. When absent, we use the LOCALE key. Locale values are like – us_US, de_DE, etc. In the old code there were two methods, one each to handle COUNTRY and LOCALE keys, which meant we would iterate the Country enum twice. We can simplify that by defining a Predicate function for each of the cases and then choosing the appropriate Predicate function.

That’s it! Tested it with a few test cases:

And the results:


Viewing all articles
Browse latest Browse all 35

Trending Articles