domingo, 8 de enero de 2017

Do … While loop with Power BI (M language).

In a first blog post, I explained how to list all the Data Platform MVP by parsing a single web page. In a second blog post, I parsed the individual pages of each MVP to get information about her awards. The next blog post was about social media and how to connect to the GitHub API to retrieve more info about activities on GitHub. The previous blog post illustrated how to parse a rather complex Json response and retrieve the commits’ activity of a repository.
In this new blog post, I’ll make an additional query to GitHub to retrieve the issues’ activity of a repository. At the opposite of the previous calls to the GitHub API, the response will be cut into different pages. It means that one call to the API won’t be enough to retrieve the whole list of issues. Indeed, issues are returned 42 by 42 and I’ll need to retrieve one, two or more pages of information to have my count of issues created during the last 52 weeks.
Let’s start by taking a look to the GitHub API to retrieve information about issues’ activity: /repos/:login/:repos/issues But, that’s a bit more complex. If you use this method as-is, it will return only the open issues and in a random sort. Hopefully, we can override this behavior by applying modifiers:
/repos/:login/:repos/issues?state=all&sort=created
State is there to identify the state of the issues to be returned, I’m enforcing to receive all the issues independently of the current state. With the sort parameterI’m requesting to order the items by creation date. By default the sort will be descending, I don’t need to explicitly specify it.
With this request, I’ll retrieve the 42 last issues created in a repository. If I want  to retrieve the 42 previous, I’ll need to add an additional parameter.
/repos/:login/:repos/issues?state=all&sort=created&page=2
The parameter page let’s me navigate in my list of issues (ordered by creation date).
If I want to retrieve all the issues, I need to go through all the pages. But how many pages do I need to retrieve … it depends some repositories have no issues, others have thousands.
Many programming languages are imperative (and not functional or declarative) and have a structure do...while to perform loops that are stopped when a condition is met (in this case probably the condition is that the page returns no issue). The M language is a declarative language and has no structure to handle loops. It’s not really a problem because we can overcome this limitation by using a recursion.
The initial state of the recursion will be to perform a query on the GitHub API with parameter page set to 1. Then we’ll parse the response and check if we’ve issues created during the last 52 weeks. If it’s the case, we’ll perform and additional request incrementing the parameter page of 1 unit until we receive no issue created during the last 52 weeks.
Because the function is recursive we need to provide an additional parmeter (page).
1
2
3
4
5
6
7
let
    GetIssuesDuringPeriod = (
        login as text
        , repository as text
        , countOfWeeks as number
        , page as number
    ) =>
Then we’ll call the GitHub API with the correct modifiers and the parameter page.
1
2
3
4
5
6
7
8
9
10
let
    #"List of issues" =
        Json.Document(
            Web.Contents(
                "https://api.github.com/repos/"
                & login & "/" & repository
                & "/issues?state=all&sort=created&page="
                & Text.From(page)
            )
        ),
We’ll consume the result of the query by analyzing the response and counting all the issues created since less than 52 weeks.
The first function returns the date corresponding to 52 weeks ago. That date will be compared to the value of the field created_at of each issue. If the date is anterior then we’ll add 1 to our count of issues. If not we won’t change our count.
To achieve this I’m using a projection named List.Accumulate. I’m setting the initial value (s .. stands for seed) to 0 and for each issue, I’m adding 1 to the seed or not based on the date comparaison. This function is returning the seed when all items have been iterated.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#"StartDate" =
        Date.AddWeeks(
            DateTime.LocalNow()
            ,  -countOfWeeks
        ),
 
    #"CountOfIssues" =
        List.Accumulate(
            #"List of issues"
            , 0
            , (s, c) =>
                if
                    DateTime.FromText(c[created_at], "en-us")
                    >=#"StartDate"
                then
                    s+1
                else
                    s
        ),
Now, I’ll need to take a decision and decide if I need to iterate on an additional page or not. If the count returned in the previous function was 0 it means that I had no issue created during the last 52 weeks on this page. Due to the enforced ordering in my request, I know that I can stop there. If the result was greater than 1, I’ll perform and additional iteration on my recursive function to check if I’ve additional issues on the next page. To achieve this I’ll recall the function itself and increment the parameter page. The result of this new call will be added to the result of the current call.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#"Recursive" =
        if
            #"CountOfIssues"> 0
        then
            #"CountOfIssues"
            + GetIssuesDuringPeriod(
                login
                , repository
                , countOfWeeks
                , page+1
            )
        else
            0
in
    #"Recursive"
in
    GetIssuesDuringPeriod
Now, I can add the result of this function to the table about GitHub repositories:
1
2
3
4
5
6
7
8
9
10
11
12
13
#"Added issue activity" =
        Table.AddColumn(
            #"Added commit activity"
            , "Count of issues created this year"
            , each GetIssuesDuringPeriod(
                [Login]
                , [#"Repository's name"]
                , 52
                , 1
            )
        )
in
    #"Added issue activity"
And I can create the following visualization to check the popularity of repositories by language (C#, R, JavaScript)
mvp-analysis-powerbi-011
and by MVP
mvp-analysis-powerbi-010
The next blog post of this series, will about parsing special html characters and translating them into correct Unicode values.