Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
menu
Follow along for the rest of the series (twitter: @christianposta, RSS/blog:
blog.christianposta.com)
Before we can build a microservice, and reason about the data it uses
(produces/consumes, etc) we need to have a reasonably good, crisp
understanding about what that data is representing. For example, before we
can store information into a database about “bookings” for our TicketMonster
and its migration to microservices, we need to understand “what is a
booking”. Just like in your domain, you may need to understand what is an
Account, or an Employee, or a Claim, etc.
To do that we need to dig into what is “it” in reality? For example, “what is
a book”? Try to stop and think about that, as it’s a fairly simple example. Try
to think what is a book. How would we express this in a data model?
Is a book something with pages? Is a newspaper a book (it has pages)? So
maybe a book has a hard cover? Or is not something that’s released/published
every day? If I write a book (which I did :) Microservices for Java Developers)
the publisher may have an entry for me with a single row representing my
book. But a bookstore may have 5 of my books. Is each one a book? Or are they
copies? How would we represent this? What if a book is so long it has to be
broken down into volumes? Is each volume a book? Or all of them combined?
What if many small compositions are combined together? Is the combination
the book? Or each individual one? So basically I can publish a book, have
many copies of it in a bookstore, each one with multiple volumes. So what is a
book then?
Where do we draw the boundaries? The work in the Domain Driven Design
community helps us deal with this complexity in the domain. We draw a
bounded context around Entities, Value Objects, and Aggregates that
*model** our domain. Stated another way, we build and refine a model that
represents our domain and that model is contained within a boundary that
defines our context. And this is explicit. These boundaries end up being our
microservices, or, the components within the boundaries end up being
microservices, or both. Either way, microservices is about boundaries and so is
DDD.
Our data model (how we wish to represent concepts in a physical data
store…note the explicit difference here) is driven by our domain model, not
the other way around. When we have this boundary, we know, and can make
assertions, about what is “correct” in our model and what is incorrect. These
boundaries also imply a certain level of autonomy. Bounded context “A” may
have a different understanding of what a “book” is than bounded context “B”
(eg, maybe bounded context “A” is a search service that searches for titles
where a single title is a “book”; maybe bounded context “B” is a checkout
service that processes a transaction based on how many books (titles+copies)
you’re buying, etc).
You may stop and say “wait a minute… Netflix doesn’t say anything about
Domain Driven Design… Neither does Twitter.. nor LinkedIn… why should I
listen to this about DDD”?
The journey to microservices is just that: a journey. It will be different for each
company. There are no hard and fast rules, only tradeoffs. Copying what works
for one company just because it appears to work at this one instant is an
attempt to skip the process/journey and will not work. And the point to make
here is that your enterprise is NOT Netflix. In fact, I’d argue that for however
complex the domain is at Netflix, it’s NOT as complicated as it is at your
legacy enterprise. Searching for and showing movies, posting tweets, updating
a linkedIn profile, etc are all a lot simpler than your Insurance Claims
Processing systems. These internet companies went to microservices because
of speed to market and sheer volume/scale (posting a tweet to twitter is
simple…. posting tweets and displaying tweet streams for 500 million users is
incredibly complex). Enterprises today are going to have to confront
complexity in BOTH the domain as well as scale. So accept the fact that this is
a journey that balances domain, scale, and organizational changes. It will be
different for each organization. Don’t ignore it.
During the booking process we may call into the SeatAvailability aggregate
and ask it to reserve a seat on a plane. This seat reservation would be
implemented as a single transaction, for example, (hold seat 23A) and return a
reservation ID. We can associate this reservation ID with the Booking and
submit the Booking knowing the seat was at one point “reserved”. Each of
these (reserve a seat, and accept a booking) are individual transactions and
can each proceed independently without any kind of two-phase commit or
two-phase locking. Note using a “reservation” here is a business requirement.
We don’t do seat assignment here, we just reserve the seat. This requirement
would need to be fettered out potentially through iterations of the model
because the language for the use case at first may simply say “allow a
customer to pick a seat”. A developer could try to infer that the requirement
means “pick from the remaining seats, assign this to the customer, remove it
from inventory, and don’t sell more tickets than seats”. This would be extra,
unnecessary invariants that would add additional burden to our transactional
model which the business doesn’t really hold as an invariant. The business is
certainly okay taking bookings without complete seat assignments and even
overselling the flight.
This is an example of allowing the true domain guide you toward smaller,
simplified, yet fully atomic transactional boundaries for the individual
aggregates involved. The story cannot end here though because we now have
to rectify the fact that there are all these individual transactions that need to
come together at some point. Different parts of the data are involved (ie, I
created a booking and seat reservations, but these are not settled transactions
wrt to getting a boarding pass/ticket, etc.)
Ideally our Aggregates would use commands and domain events directly
(as a first class citizen.. .that is, any operation is implemented as commands,
and any response is implemented as reacting to events) and we could more
cleanly map between the events we use internal to our bounded context and
those we use between contexts. We could just publish events (ie,
NewBookingCreated) to a messaging queue and then have a listener consume
this from the queue and insert it idempotently into the database without
having to use XA/2PC transactions instead of inserting into the database
ourselves. We could insert the event into an dedicated event store that acts
like both a database and a messaging publish-subscribe topic (this is probably
the preferred route). Or you can just continue to use an ACID database and
stream changes to that database to a persistent, replicated log like Apache
Kafka using something like Debezium and deduce the events using some kind
of event processor/steam processor. Either way, the point is we want to
communicate between boundaries with immutable point in time events.
This comes with some great advantages:
Another interesting concept that emerges from this approach is the ability
to implement a pattern known as “Command Query Separation
Responsibility” where we separate our read model and our write models into
separate services. Remember we lamented the internet companies don’t have
very complex domain models. This is evident in their write models being
simple (insert a tweet into a distributed log for example). However their read
models are crazy complicated because of their scale. CQRS helps separate
these concerns. On the flip side, in an enterprise, the write models might be
incredibly complicated while the read models may be simple flat select queries
and flat DTO objects. CQRS is a powerful separation of concerns pattern to
evaluate once you’ve got proper boundaries and a good way to propogate data
changes between aggregates and between bounded contexts.
So what about a service has only one database and doesn’t share with any
other service? In this scenario, we may have listeners that subscribe to the
stream of events and may insert data into a shared database that the primary
aggregates might end up using. This “shared database” is perfectly fine.
Remember, there are no rules, just tradeoffs. In this instance we may have
multiple services working in concert together with the same database and so
long as we (our team) owns all the processes, we don’t negate any of our
advantages of autonomy. Thusly when you hear someone say “a microservice
should have its own database and not share it anyone else” you can respond
“well, kinda” :)
This approach brings even more benefits that you can add to the benefits
of communicating via events (listed above):
• Now you can treat your database as a “current state” of record, not the
true record
• You can introduce new applications and re-read the past events and
examine their behaviors in terms of “what would have happened”
• You can perfect audit logging for free
• You can introduce new versions of your application and perform quite
exhaustive testing on it by replaying the events
• You can more easily reason about database versioning/upgrades/schema
changes by just replaying the events into the new database
• You can migrate to completely new database technology (ie, maybe you
find you’ve outgrown your relational DB and you want to switch to a
specialized database/index)
SHARE ON
The Hardest Part About Microservices: Your Data was published on July 14, 2016.
Name
© 2018 Christian Posta. Powered by Jekyll using the Minimal Mistakes theme.