Hi. I am travis and I live and work in a place long ago abandoned by the gods: New York City. By day, I am a web developer at Blue Apron. By night, i am a normal person.

Intro to Avro

Remember the good old days when you only had one app and life was easy? There was only one database (one source of truth), and you had never even heard of Kafka? You heard rumors of micro-services and eventual consistency, and it scared you, but you never thought it would reach your humble little engineering team.

Those days are over, so now you have a thousand new problems to solve and need to know about things like Avro. Because you have 50 micro-services now, you need a fast/compact data serialization format that allows every service to communicate effectively (i.e. speak the same language and maybe even learn some new words along the way), and Avro is one of many solutions to this problem (another is protocol buffers).

Data serialization is one of the most boring things I can think of, so it would be great if it didn't require any more than 5 minutes to get this part out of the way. That's why I had high hopes when I found this post from Salsify which details their very-much-appreciated efforts to Ruby DSL-ify Avro schemas. To get started, grab the gemsies:

# We'll get to why >= 0.8 is a good thing
gem 'avro-builder', '~> 0.8'
gem 'avromatic'

After you bundle install, the next thing you'll want to do is define your schema and a model based on said schema. There are a number of ways you can do this, but I went the includable module route because it seemed cool:

# app/models/dsl/user.rb
record :user do
  required :id, :int
  required :name, :string
  required :email, :string
end
# app/models/avro/user.rb
module Avro
  class User
    include Avromatic::Model.build(schema_name: 'user')
  end
end

At this point, I found myself quite confused. I needed to set up a SchemaStore to generate the Ruby code at runtime, but the AvroTurf implementation only works with .avsc files, but you only get those files by generating them with the Avro::Builder gem, but that means I have to generate them beforehand, and now there's all these things I have to do, and I'm overwhelmed! So, I made Salsify a thingy, and now you don't have to do that anymore (hence version 0.8.0 and above). In other words, I can use the DSL files directly in my app without generating .avsc schema files first.

Great, now there are just a few more steps to get this working. You'll want to set up a schema registry, which versions your schemas so they can evolve over time (read: you can change a field without breaking 90% of your services). Once that is running, all we have left to do is configure our app to use the schema registry and Avro::Builder as our schema store, and we can use the Ruby schema definitions directly in the app:

Avromatic.configure do |avro|
  avro.schema_store = Avro::Builder::SchemaStore.new(path: 'app/models/dsl')
  avro.registry_url = 'http://blueapron:avro@localhost:21000'
  avro.build_messaging!
end

Now, we can write code super quickly and get the indispensable feature of the AvroTurf::Messaging API. In case you aren't following, this embeds the schema ID in the Avro message (as opposed to the full JSON schema) so the size of the message is dramatically reduced. On the receiver's end, the schema can be requested from the registry using this ID, so the receiver knows how to decode the message into a complete object.

Here's what it looks like to create an Avro object in Ruby and encode it into a message that can be sent over the wire:

[1] pry(main)> user = Avro::User.new(id: 10, name: 'Ronnie', email: 'come@me.bro')
{
       :id => 10,
     :name => "Ronnie",
    :email => "come@me.bro"
}
[2] pry(main)> user.avro_message_value
I, [2016-08-11T15:21:23.533555 #35501]  INFO -- : Registered schema for subject `user`; id = 1
"\u0000\u0000\u0000\u0000\u0001\u0014\fRonnie\u0016come@me.bro"

Notice the log statement: because we're encoding a message, the gem (AvroTurf) makes a call to the schema registry under the hood to ensure that any receiver will know how to interpret it. On the receiving end, we can easily decode this string:

[1] pry(main)> Avro::User.avro_message_decode("\u0000\u0000\u0000\u0000\u0001\u0014\fRonnie\u0016come@me.bro")
I, [2016-08-11T18:36:20.510070 #40099]  INFO -- : Fetching schema with id 1
{
       :id => 10,
     :name => "Ronnie",
    :email => "come@me.bro"
}

This time, the gem requests the schema from the registry and decodes the message into a convenient little PORO.

Suppose I want to add a new field to User, so I update my schema file:

record :user do
  required :id, :int
  required :name, :string
  required :email, :string
  required :location, :string, default: 'jersey_shore'
end

What happens when I encode a message using this new schema and send it to another app using the old schema? If you follow some guidelines, schemas can be fully backwards and forwards compatible. For example:

# The sender with the new schema
[1] pry(main)> user = Avro::User.new(id: 10, name: 'Ronnie', email: 'come@me.bro')
{
          :id => 10,
        :name => "Ronnie",
       :email => "come@me.bro",
    :location => "jersey_shore"
}
[2] pry(main)> user.avro_message_value
I, [2016-08-11T18:39:48.867867 #40454]  INFO -- : Registered schema for subject `user`; id = 2
"\u0000\u0000\u0000\u0000\u0002\u0014\fRonnie\u0016come@me.bro\u0018jersey_shore"


# The receiver with the old schema
[1] pry(main)> Avro::User.avro_message_decode("\u0000\u0000\u0000\u0000\u0002\u0014\fRonnie\u0016come@me.bro\u0018jersey_shore")
I, [2016-08-11T18:40:34.095267 #40586]  INFO -- : Fetching schema with id 2
{
       :id => 10,
     :name => "Ronnie",
    :email => "come@me.bro"
}

Neat! Nothing breaks! In this case, the receiving service just drops the new field and decodes the message into an object that it understands. If this process were reversed, the receiver with the new schema would still be able to understand a message encoded with the old schema because of the default value.

Hopefully this facilitates in some infinitesimal way the process of migrating to a micro-service architecture. It's certainly not an easy transition...just ask literally anyone who has tried. Over time, these things will be easier and easier to build, but until then it's pretty cumbersome. All that really matters in Rubyland is that we can do it in style.

Beware the ORM: Locking and Joins

FactoryGirl First Or Create