In a first blog post, I explained how to list all the Data Platform MVP by parsing a single web page. In a second blog post, I parsed the individual pages of each MVP to get information about her awards. The next blog post was about social media and how to connect to the GitHub API to retrieve more info about activities on GitHub. The previous blog posts ([1] and [2]) illustrated how to parse rather complex Json responses and retrieve the commits’ activity and the issues’ activity of a repository.
In this new blog post, I’ll try solve an issue that is there since the first blog post. When parsing the MVP website and retrieving the first and last names, I had an issue with some special characters: my own name was retrieved as “Cédric Charlier” in place of “Cédric Charlier”. For some reasons some special characters are encoded (but some are not). Anyway we need to decode them.
From a technical perspective, I’ll solve this issue with a recursion but also with some error handling in Power Query/M language.
The first step is to analyze the problem that we’ve in front of us. The special characters encoded have always the same pattern. A ampersand (“&”) followed by a cross (“#”), a set of digits and a semi-colon (“;”) to end the fragment. We’ll first need to identify this kind of pattern in the string. I’ll simplify to only search for the ampersand and the semi-colon.
let ConvertHtmlSpecialChars = (string as text) => let start = Text.PositionOf(string, "&"), end = Text.PositionOf(string, ";"),
Then, I’ll extract this fragment from the string. Then I’ll confirm that this framment is effectively starting by “&#”. If it’s the case then I’ll return the number in this fragment.
value = Text.Range(string, start, end-start+1), cleanValue = if Text.StartsWith(value, "&#") then Text.Start( Text.End( value , Text.Length(value)-2 ) ,Text.Length(value)-3 ) else null,
Now, that I’ve this number, I can translate it to a normal character with the help of the function
Character.FromNumber
and replace the fragment by the new value.replace = Character.FromNumber(Number.FromText(cleanValue)), replaced = Text.Replace(string, value, replace),
Now, I need to handle the fact that I can have more than one special character in my string and I must process each of them. Once again recursion will help me!
continue = if string=replaced then replaced else ConvertHtmlSpecialChars(replaced) in continue
That’s it? Not really. My function is crashing easily. If I’ve no special characters in my string, the function
Text.Range(string, start, end-start+1)
will crash because -1 (start’s value) is not a valid. But I’ve many other places where I could have an issue.
A little bit of M theory to understand the basics of error handling in this language. When an error is thrown by a function this error is propagated to the caller and so on. It’s really practical because you can catch the error at a higher level and you don’t need to intercept all the potential errors.
In this case, I’ll catch the potential error between the replacement and my recursive condition.
newString = try replaced otherwise string,
The keyword
try
means that we’re expecting a potential issue during the call to the function replaced. The keyword otherwise
sepcifies that if we effectively have an error during the call to replaced then we need to use another function in place of. In this case the function is just the value of string (the initial value). So if at any moment during the substitution, I’ve an error then I’ll use the original string.
I still need to update the recursive condition to use this new function:
continue = if string=newString then newString else ConvertHtmlSpecialChars(newString) in continue in ConvertHtmlSpecialChars
Now, I’ll need to add the call to this function to the table dealing the MVPs’ names:
#"Html Special Chars" = Table.TransformColumns( #"Renamed Columns" ,{{"MVP name", ConvertHtmlSpecialChars}} ) in #"Html Special Chars"
And now I can compare the before
and after
https://seddryck.wordpress.com/2017/01/08/error-handling-in-power-bi-m-language/