Call third parties in parallel

Posted on

Problem

I am trying to speed up some reporting on all employees but I am having difficulty determining where the “proper” place to parallelize my tasks. In order, this is what my code is currently doing.

  1. Ask for all developers
  2. Create a Developer object for each employee
  3. REST call to Jira to get JSON of all tickets for employee
  4. REST call to Github using info from Jira json to get pull request information
  5. Hydrating JiraTicket objects with Jira and Github jsons
  6. Returning developers

I have the below code to make the requests to Github and Jira. The getJiraTicketsAsync takes the json from Jira, the json contains info on all tickets. The getPullRequestURLsFromTicketId method makes another call to Jira as you cannot get development information from the JQL call to get the tickets json. My question is: am I parallelizing in the proper place? Should I move it up to a higher level and just call createDeveloper() in parallel?

private Set<JiraTicket> getJiraTicketsAsync(JsonArray ticketsArray)
{
    List<Callable<JiraTicket>> tasks = createJiraTicketTasks(ticketsArray);
    Set<JiraTicket> tickets = new HashSet<>(tasks.size());
    try
    {
        ExecutorService service = Executors.newCachedThreadPool();
        List<Future<JiraTicket>> futures = service.invokeAll(tasks);
        for(Future<JiraTicket> future: futures){
            try
            {
                tickets.add(future.get());
            } catch (ExecutionException e)
            {
                logger.error("There was an error getting a jira ticket", e);
            }
        }
        service.shutdown();
    } catch (InterruptedException e)
    {
        logger.error("Interrupted hydrating jira ticket", e);
    }
    return tickets;
}



private List<Callable<JiraTicket>> createJiraTicketTasks(JsonArray ticketsArray)
{
    List<Callable<JiraTicket>> tasks = new ArrayList<>(ticketsArray.size());
    for (int n = 0; n < ticketsArray.size(); n++)
    {
        int index = n;
        tasks.add(()->{
            JsonObject ticketJson = ticketsArray.get(index).getAsJsonObject();
            JsonArray pullRequestsJsonArray = getPullRequestURLsFromTicketId(ticketJson.get("id").getAsString());
            ticketJson.add("pullRequestUrls", pullRequestsJsonArray);
            return ticketHydrator.hydrateJiraTicket(ticketJson);
        });
    }
    return tasks;
}

Solution

The first consideration is that your method:

private Set<JiraTicket> getJiraTicketsAsync(JsonArray ticketsArray)

Is not asynchronous at all.

As it returns a Set of JiraTicket when you call it you wait until the collection is completed.

So the method name is not correct, and could drive to missunderanding.

On your code:

ExecutorService service = Executors.newCachedThreadPool();
List<Future<JiraTicket>> futures = service.invokeAll(tasks);
for(Future<JiraTicket> future: futures){
    try
    {
        tickets.add(future.get());
    } catch (ExecutionException e)
    {
        logger.error("There was an error getting a jira ticket", e);
    }
}
service.shutdown();

The problem here is the for loop where you consume the Future object.
Of course you parallelize the requests on multiple threads, so you will probably gain a little bit of efficiency, but your code is still blocking.

The way to write concurrent code is not trivial, you have to break up your flow and rething it starting on dependencies.

You need JiraTicket in order to get infos from Github. In that way you should think your task like so:

  1. Get the user JiraTicket
  2. Get the infos from Github
  3. Go to the next JiraTicket

At the end of this concurrent tasks you could do the sequencing tasks like create a Developer object and so on.

In a comment I suggest you to use a queue instead of use any sort of collection.
The reason is that a collection need to be synchronized and for that reason will stuck you on blocking code.

If you need to speed up and use advantage of concurrency, you should avoid blocking code.

A queue decouple the producer part, where you call the Jira and Github servers, from the consumer part, where you create the Developer object and complete the report.

The advantage in your case depends on the magnitude of the number of object you’re dealing with, and the time your flow is using now.

There are many libraries that could help you to achieve the asyncronous part, such JMS, Reactive Streams or design pattern like using Publisher / Subscriber pattern.

Leave a Reply

Your email address will not be published. Required fields are marked *