ContribsGH
The many ways to implement a REST service (and other asynchronous software systems) using Scala.
1. Introduction - Main characteristics of Scala
The purpose of this series of posts is to present a set of solutions to a concrete, simple but non-trivial, software design and implementation problem, exemplifying several ways in which Scala can be used as a great tool for general software engineering and as a simple and powerful platform for the development of asynchronous software systems.
Scala is a powerful and expressive programming language that sits at the center of a software development environment with high-quality software tools and a huge eco-system of (mostly license-free) libraries covering any conceivable domain of application.
It is a statically typed language with an advanced type system that puts at a developer's hands not only a great vehicle for expressing software abstractions but also a great thinking tool, a tool that can make a big difference in the difficult way to find good solutions to hard software design problems.
Scala is a multi-paradigm programming language that combines the fundamental ideas and abstractions of modern typed functional-programming languages, like Haskell, with an enhanced version of the traditional object-oriented features of languages like Java.
It was created to be appropriate for quickly developing small programs as well as for effectively programming in the large: it was thought, from the very beginning, to be scalable (and that is, in fact, the origin of its name).
There is an incredibly simple and powerful platform for the development of asynchronous (parallel and distributed) programs based on Scala. This is a field in computing - of extraordinary importance nowadays - which demands from a software professional the adoption of a particular approach to the development of software systems that is very difficult to master without recurring to the proper abstractions. These abstractions are provided by Scala in a variety of ways, some of which will be practically exemplified in our posts.
Finally, Scala is a language that after a period of big hype (which seems to be unavoidable around significant innovative ideas in the world of software development) has reached maturity, and now shows a steadily growing adoption trend as the center of an advanced software engineering platform, not based on hype but on intrinsic quality.
2. The many ways of Scala
All main-stream programming languages offer many ways to solve a given problem, but Scala is especially suited for the development of alternative options that can differ in quite deep (even conceptual) ways.
Any one of those options will present a specific mix of desirable software qualities and will be more or less suited than the others depending on the specific conditions / restrictions put in place by the problem.
The options that Scala gives you as a developer depend not only on the number of suitable libraries at your disposal. You also have options in alternative ways to approach the problem and conceive a solution for it, even using - or combining - diverse programming paradigms.
In more concrete terms, Scala can be used at least in the following ways:
- As an enhanced Java, just making profit of its better (less verbose, more expressive) syntax and its full compatibility with the Java eco-system.
- As a practical means to take advantage of the power of the functional programming paradigm, fruitfully applying only its basic concepts.
- As a means to combine functional programming with the best of the object-oriented paradigm, taking advantage of a well-designed and robust standard library which contains, for example, a rich set of collection classes (strict and lazy, mutable and immutable, sequential and parallel)
- As a mature platform for the development of asynchronous message-based software systems, applying the actor paradigm implemented by Akka.
- As an extremely efficient Big Data platform based on Spark, suitable to implement data extraction and transformation data flows (batch or streaming), for ML or BI purposes.
- As an advanced functional programming tool for developing efficient, declarative, pure functional programs.
This list can be viewed as a scale for the progressive adoption of Scala, showing another of its advantages: no matter its (unfair) fame of being difficult to learn by an individual programmer (and therefore difficult to adopt by a programming team), Scala can be embraced gradually, easing the path to the full enjoyment of all its advantages.
The advantages of adopting Scala, gradually or not, can be enormous for the efficiency of your development process and the quality of the resulting products, because you get:
- Conciseness of notation within a modern syntax. Some experienced programmers estimate the average ratio of lines of code (LOC) between a Java program and an equivalent Scala one, to be about 4 to 1. That ratio can be even greater, but conciseness is not an objective in itself …
- Readability. A feature that cannot be overstated because it is at the base of many software-quality attributes. Scala helps to write readable programs, not only because of its concise syntax, but also because it allows to naturally use many powerful and expressive abstractions.
- Type safety and type inference. The Scala compiler can infer the types of almost all kinds of expressions, adding to the conciseness of Scala programs, but also helping to avoid type errors which in dynamically-typed languages are true time bombs destined to explode at the least expected moment.
- Efficiency. The compiler-generated code of Scala is a lot more efficient at run-time than the code of interpreted languages.
- All the great features of a statically typed modern functional programming language, like a simple evaluation model (amenable to the application of simple strategies for designing programs and reasoning about their correctness) and a big assortment of potent abstractions and convenient constructs: first-class functions, parametric types, case classes, pattern matching, for comprehensions, laziness, type classes, etc.
Through this series of posts we will show a set of concrete solutions to a program design and implementation problem, exemplifying several of the aforementioned ways to use Scala.
The code of the solutions will be explained assuming only a basic knowledge of a few functional programming features that can be found in almost any modern programming language, which include parametric classes (the so-called generic classes of Java and many present-day OO languages) and functional combinators like map, filter, flatmap and fold (whose concrete meaning in the context of our code will be succinctly explained anyway).
3. The problem statement
We are requested to implement a REST service based on the GitHub REST API v3 with an endpoint that, given the name of one organization, returns a list of contributors to the organization, sorted by the number of their contributions.
Each organization has many repositories and each repository has many contributors. The endpoint must respond to a GET request at port 8080 and address /org/{org_name}/contributors. It has to respond with a list of contributors and the total number of their contributions to all repositories of the organization, in the following JSON format: { “name”: contributor_login>, “contributions”: <no_of_contributions> }
The service should handle the GitHub’s API rate limit restriction using a token set as an environment variable of name GH_TOKEN and should take into account the pagination of the responses of the GitHub API.
4. Structure of our first solution
Our first solution to the stated problem is a Scala program consisting of four small components. It was developed around a very simple data flow pattern consisting of a sequence of processes such that the output produced by each one is consumed by the next.
Seen from the outside, using our program as a component of a bigger one, the data flows from an initial producer — the REST GitHub API — to the final consumer of the generated JSON response.
For convenience we have added to the program a sample final consumer: a simple Web client that allows a user to make a request without writing by hand the URL required by the endpoint, and to interpret the response without having to scrutinize a JSON string.
Needless to say, the user can utilize a command-line tool, like curl, instead of the provided Web client, or even just write the required URL directly in a Web browser. However, our simple client is nicer, and it took so few lines of the code of our solution that it was certainly worth the effort.
The components of our program are:
- a REST client,
- a REST server,
- a processing module, which takes the outputs of the REST client, process them in very simple ways, and builds the structure needed as input by the REST server, and
- a set of domain classes used by the other components.
Our REST client is used to ask GitHub for the repositories of a given organization and the contributors to each one of them. We access the GitHub services using Spray, an open-source toolkit for building REST/HTTP-based integration layers on top of Scala. We have chosen Spray mainly because of its simplicity. It has all we need to send our requests to the GitHub REST API and consume the corresponding responses, writing just a few lines of Scala code.
Our REST server uses Lift, a powerful Web framework for Scala. Using Lift we are able to implement our endpoint using a set of domain-specific languages (DSLs) designed for that purpose, again using very few lines of Scala code. Besides, Lift allows us to very easily build the simple Web client mentioned before, for entering the data needed to make a request and display its response in a human-readable format.
Of course, we had alternatives to the use of Spray / Lift, for example Akka-HTTP (which, incidentally, was based on Spray and superseded it). But using simpler toolkits allows us to focus on the explanation of the code of our solution minimizing the interference that would result from explaining the code needed by more sophisticated toolkits.
By the way, the implementation of our REST client and server components (which, on the other hand, will remain almost unaltered in the different solutions proposed to the problem) does not exceed 60 lines of Scala code. The difference between solutions will reside mainly in our processing module, which in the first version has only around 10 lines of Scala code.
5. Next installments and further work to enhance our first solution
In the next installments of this series we will explain the code of several solutions to our problem.
We will start, in the second post, discussing a version based on a synchronous implementation of the processing module which, as mentioned before, is responsible of building the input to the REST server module using the output of the REST client module. This first version turns out to be a simple and clear solution to the problem, but it has a serious drawback: it is very inefficient. The REST calls to the Github API made by the processing module are synchronous, every call starts only when the preceding one has finished, thus blocking the (only) execution thread used.
The third and subsequent posts will present alternative ways to implement our REST service more efficiently, starting with a version based on an asynchronous implementation of the processing module using Scala futures, which will require changing just a few lines of the code of our first version for big savings in the time needed to serve a request.
The third solution to the problem will use Akka typed actors. It will have an efficiency similar to that of the second, the first time a request is made for an organization. But the second request for the same organization will be an order of magnitude faster, because the actors created to serve the first request will remain alive in memory, effectively working as a cache for our REST service.
A fourth solution will enhance the second, making up for the advantage of the third one we just mentioned. It will implement a cache for our service using Redis, an in-memory data structure store, that can be used as a (No-SQL) database, cache, and message broker. This solution will exhibit an efficiency comparable in all respects to that of the third, and will be able to scale-up in exactly the same way (limited only by the number of cores available for parallel processing in the server). Besides, the code changes needed for the implementation of the Redis cache will be much smaller (again just a few lines) than the changes demanded by the implementation of the Akka actors. Nevertheless, an advantage of the third version will remain: it will be capable to scale-out in an astonishing simple way (widening the limit to the number of cores available for parallel processing in a cluster of servers).
While presenting new ways to solve our software development problem, we will also make some changes to its specification. Doing so, we will be able to give an idea not only of the many ways to approach a solution to the problem in Scala but also to show how Scala can ease the maintenance of a given solution, as a direct consequence of how clear and understandable our programs are.
To help in the comprehension of the code of the presented solutions, we will precede its explanation with asides containing Scala snippets selected to illustrate some necessary notions, seeking to also show how easy it can be to learn the basics of the language just playing with some "Scala by example".
A caveat is in order. All the solutions presented in this series will ignore some aspects of the software development process that are relevant in a real project, outstanding among them testing, and the documentation of code. We do so in order to focus on our main goal: showing several ways that Scala gives us to solve the problem of implementing a simple REST API, in an attempt to vividly exemplify how Scala shines as a great programming language for general software engineering and, in particular, for the development of parallel and distributed software systems.
6. Downloading all we need to explore our first solution and starting with “Scala by example”
To better follow the discussion of the code of our first solution, you should download it from ContribsGH-S.
Our first aside. Using this format we will present:
- Pointers to external sources for software download, installation or documentation.
- Code snippets intended to be run on a Scala REPL console to further explain / exemplify the code of our solution.
- Comments, considered interesting for understanding the explained code or the example snippets, frequently providing type explanations.
- Asides can be ignored if considered redundant or useless, without impairing the understanding of the main text.
Having the code available will allow you to play with it and use it as a base for your own variations. For that purpose you will need to install a pair of tools, selected among the many great ones available for Scala.
Downloading and installing the selected software tools
Download sbt (the “simple build tool” for Scala) from Scala sbt and install it. With sbt you will be able to run the downloaded code of our program as well as to open a REPL console to further explore Scala running our aside snippets or any others of your own.
If desired, download an IDE, like IntelliJ IDEA from JetBrains and install it together with the Scala plugin. This will allow you to import the downloaded code (see the related documentation at the JetBrains site) and modify it to experiment your own variations. Alternatively, you can just use your preferred text editor to modify the code and then use sbt to compile / run it.
Once sbt is installed on your computer, run it inside the download directory of the code. After a while (the first time this can can take a few minutes, as sbt will have to download all needed dependencies) you will see the prompt sbt:ContribsGH-S>, where you can type jetty:start to run the HTTP server
hosting our first solution. You can also type console to start a Scala REPL inside sbt. An included README.md file gives you detailed instructions on calling the implemented service, whether by issuing the specified URL or using the provided Web client.
Having a Scala REPL console available will allow you to play interactively with Scala and, if necessary, learn very quickly the basics needed to understand the code of our first and subsequent solutions. To help you on that path, we will include here and there asides containing snippets of Scala code with brief explanations of their meaning / use, in an attempt to show how easy can be to learn "Scala by example".
Let's start with a small dose of Scala lists.
Scala lists by example
In Scala a list from the standard library belongs to the parametric type
List[T]
whereT
is the type of the list elements.A parametric type is a type parameterized by (one or more) other type(s). For our purposes you can view this types as container types or collections, whose elements belong to the type(s) used as parameter(s).
The basic List constructor in Scala is
::
an infix operator which takes as arguments a value of typeT
and a list of typeList[T]
(which can be the empty listNil
) and returns a new list having as head the first argument and as tail the second one.Scala lists have, for convenience, another constructor named, appropriately,
List
. To construct the list containing the integers 1 and 2 we can use the expression1 :: 2 :: Nil
or, equivalently, the expressionList(1, 2)
.
In the example:
The first two expressions construct a list of integers
l1
and a list of stringsl2
.The third expression applies the higher-order function
map
tol2
giving a new list of stringsl2_upper
.
map
applies (maps) a function of type(A) -> B
to the elements of aList[A]
giving as result aList[B]
.The fourth expression is equivalent to the third one, but the function to be mapped is expressed using a shorthand notation where the character
_
represents its argument (a role assumed byelt
in the previous expression,
where the function was written using the so-called “lambda expression” notation without any abbreviations)The fifth expression defines a function sum, which is called in the following expression with
l1
as argument.sum
is defined using the combinatorfoldLeft
, which applies an “accumulation function” to the elements of a list to get an accumulated result.
foldLeft
applies a two-argument function of type(B, A) -> B
to aList[A]
giving as result a value of typeB
. The accumulation process starts with an initial value for the accumulator (a value of typeB
,0
in the example) and sequentially applies the accumulation function (+
in the example) to the previous value of the accumulator and the current element of the list (starting from the left) to get the new value of the accumulator. The result offoldLeft
is the last value of the accumulator.The last expression takes a list of strings with the first and last names of some well-known personalities, parses from those strings the part corresponding to the last names, filters out the last names with 7 or more characters, and sorts the remaining last names first numerically by their length and then alphabetically (if they have the same length).
filter
applies a filtering condition (a function of type(A) -> Boolean
) to aList[A]
giving as result a newList[A]
containing only the elements of the original list that satisfy the filtering condition.
sortBy
takes as argument a function that, given an element of a list to be sorted, returns a tuple with the values that will be used for sorting the list, ordered by their relevance for the sorting process.
Copy the example expressions to a Scala console to get them evaluated, one by one. Write and evaluate similar expressions to improve your understanding of the example. In particular, experiment with expressions like the last one (sequences of transformations of a list). Similar expressions will be very useful in our REST server module.
If the examples provided wet your appetite enough to impel you to deepen your understanding of Scala, there is a world of options to explore: great on-line courses and tutorials, excellent documentation, and wonderful books. We suggest you to start your exploration at Scala
And now (finally) we are ready for the detailed discussion of our solutions to the stated problem. Want to take a look? please follow this link.