Today I was refactoring this piece of code which had way too many IF..Else statements. We get data from an upstream team as files in S3. Each file contains the details of a customer. Unfortunately though, the details had not been encoded in any standard format like XML, Json, etc. The structure of these files are:
- Each line contains one attribute only.
- The attribute key and value are separated by “:”.
- There may or may not be extra blank spaces.
- The file can have blank or wrongly formatted lines.
- The file can have attributes that we are not interested in.
There was this function that accepted the location of the file and returned Person object:
- Read lines from the file.
- Using a series of IF blocks to check which attribute is present on a line and parse the value accordingly.
Following is what I ended up after refactoring the code to use lambdas. You can find the complete code at GitHub.
First the necessary classes:
Now the interesting lambda stuff. Our method takes a list of lines as input and returns the Person object as output. Let’s first create a stream of the input lines and extract the attributes as a map. We’ll do the following operations (in order) on each line:
- Split the line at “:”.
- Filter out all the lines that do not have 2 elements after the split. Remember all our attributes are present as “key : value”.
- Filter out all the lines whose key is not present in the Attribute enum as we are only interested in a few selected attributes.
- Now that we only have those pairs that we are interested in we’ll collect the result into a Map. Pay special attention to the collect() operation. It took a while to get it right.
There’s only one more function that needs to be explained – getCountry(). The complication with Country is that the COUNTRY key might not always be present. When absent, we use the LOCALE key. Locale values are like – us_US, de_DE, etc. In the old code there were two methods, one each to handle COUNTRY and LOCALE keys, which meant we would iterate the Country enum twice. We can simplify that by defining a Predicate function for each of the cases and then choosing the appropriate Predicate function.
That’s it! Tested it with a few test cases:
And the results: