Thursday, November 29, 2012

On the value of testing


In attempt to subscribe to behavior-driven development I've been using rspec and jasmine for testing my ruby and javascript code respectively. Along the way I've had some thoughts about the value of testing, in particular for collaborative projects. Although there's plenty of discussion online about this I found it helpful to jot down my thoughts.

Tests mitigate many problems that arise with collaborative projects:
  1. You learn ways to develop a component so that it can be used in isolation, i.e., so it can be used in your tests of the component, without whatever other infrastructure will be in place or the code into which the component will be plugged. This forces a degree of modularity and an interface to the component that is sufficient for testing, and thus likely sufficient for other components to interact with it in sensible ways.
  2. Someone can read the tests of a component to learn how a component is used (say, if the documentation is not quite sufficient, or worse, absent). 
  3. Tests are helpful is by forcing well-defined, stable interfaces for APIs. Two developers can be working on two separate components that must interact. If they get together and discuss the interfaces, they can each encode the specification in tests of their components, and use the tests as a means of keeping the interfaces stable. 
  4. Tests can even play a role in ensuring that APIs and interfaces don't change between minor versions when subscribing to the semantic versioning ideals. 
  5. By having a habit of developing components that can stand alone well enough to be tested, it provides a means of involving a developer who is not yet familiar with the the other components of a system or the big picture of the project. The developer can be given a set of tests that his component must satisfy, and thus he doesn't need to know sufficiently much to design the interface, that can be provided by another developer. Instead he must simply implement the component so that the tests are satisfied.
  6. When a developer leaves the company or project, the tests survive as a record of how things are supposed to work. 
  7. The philosophy that "code wins arguments" can be put into practice by proving that a bug exists with a test that fails. Then a successful fix will be evidenced by a passing test. Each of these steps can be committed to the repository and referenced from the bug tracker so that everyone knows the bug is fixed, can see how it was fixed, and if the bug ever resurfaces, then the test that proved its existence will fail and the bug can be squashed again.
  8. Tests provide a means of noting the absence of a feature so that it will not be forgotten to be implemented, but allowing a developer to continue on her current trajectory and not be distracted by every thing that crops up.
  9. Tests allow one to delegate work. If you notice something is missing from another developer's code or want to communicate to another developer what needs to be done, writing some tests are a good way to do this. It has the benefit that you can communicate how you want something to work, and it shows the other developer that you care enough about the feature to spend the time to write the specs. This way it's easier to ask them to spend time developing it, as everyone is busy and might not want to simply be told what to do. This way it's sort of a joint effort. 

Monday, October 1, 2012

Why putting off what you can do tomorrow until tomorrow is not procrastination

Perhaps this is obvious, but it helps tremendously in my productivity.

There are a number of advantages to saving things that can be done later, until later:

  1. You might decide later that you didn't even need to do it, and thus you saved time.
  2. You are able to get the more important things done first.
  3. You avoid any additional recursion (i.e., any unexpected task that the task might induce you to do, will be avoid).
  4. By the time you end up doing it, you might know a more efficient way to do it, be more capable to do it, or be less likely to make a mistake in doing it because of what you've learned along the way, doing other, higher-priority things.
It's this last one that I've found an especially rich source of saved time---particularly in the the technical arena. Often small tidbits of knowledge make the difference between huge amounts of wasted time and quickly getting a task done. This has many causes.

So while it might sound like procrastination, saving tasks until later and instead focusing on more pressing ones, may enable a fortuitous discovery that makes the postponed task much easier.

Along these lines, I've been using Trello, to quickly note something I'd like to come back to later. This way I can postpone tasks regret-free. And in addition to getting higher priority things done first, who knows, I might be able to do the queued task faster or better, later.

Kevin

Saturday, September 22, 2012

Getting upserts working for single-collection inheritance in MongoMapper

I've found that when one does an upsert using one of the modifiers (increment, set, push, etc.) on a derived document class (i.e. using single collection inheritance), if the document doesn't exist, then the newly created document doesn't have _type set. 

I've gotten around this by overriding criteria_hash() in MongoMapper::Plugins::Sci::ClassMethods as follows:


Let me know if you see any problems this could introduce or whether there are ways in which it's not a comprehensive solution.

Hopefully this will make its way into the master branch.

Cheers,
Kevin

Tuesday, August 28, 2012

Adding find_and_modify to MongoMapper and the simplicity of plugins

I've been using MongoMapper as my ORM for a while now, but for some reason find_and_modify() is conspicuously absent. find_and_modify is important for any situation in which you want to lock a record while you work with it, say by marking a state property

So I worked up a little MongoMapper pluggin that did the trick:



Then to use it, simply, add the plugin to your document model:


class FindAndModifyTest
   include MongoMapper::Document
   plugin FindAndModifyPlugin
end

Here's an rspec test that shows how it works:





VoilĂ 

Kevin



Thursday, August 23, 2012

Rails: When should I override in a subclass or override in the original class?

Despite the typical enormous flexibility of Rails functions, sometimes I find a method that is not quite flexible enough. One has several options:

  1. Write a wrapper that calls the original function.
  2. Write a completely separate implementation.
  3. Override the method in a subclass (this can call the super, so it's a bit like #1)
  4. Override the method in the original class by reopening the class. 
Here's a concrete example:

I was implementing an API authorization token based on the description in the rails cast: #352: Securing an API. This is to avoid #5 in the top 10 security threats facing web applications.

Since my API is used for sending and receiving data from a client web application (based on EmberJS), I wanted to send an API key with each main page load, and then all subsequent Ajax calls would use this key. All this happens behind the scenes. 

But I ran into a problem. How am I to send the API key in the web page served by my test suite (Jasminerice)? 

I'd prefer to not muck with the Jasminerice source code as this will make it a pain to upgrade each new release. In stead I decided to use a helper that Jasminerice was already using, and just provide a new implementation of the built in crsf_meta_tags helper, which could also serve my API's crsf-token in addition to the built in one Rails uses. 

This required #4 from the above list, overriding the method in the original class. This is necessary for Jasminerice to use my implementation. #1 and #2 don't work because Jasminerice won't call a method it doesn't know about. #3 won't work because Jasminerice won't have the subclass in its scope (say I put it in my own ApplicationHelper class, which would work for my local rails app, but not inside Jasminerice). 

That left me with overriding the original implementation. Here's my code:

module ActionView
  module Helpers
    module CsrfHelper
      alias_method :orig_csrf_meta_tags, :csrf_meta_tags
      # I override this function so that I can add my 
      # json_csrf_token tag as well
      def csrf_meta_tags
        api_key = Rack::Utils.escape_html ApiKey.create.access_token
        json_tag = ""
        [orig_csrf_meta_tags, json_tag].join("\n").html_safe
      end
    end
  end
end

In the above, I approximated a wrapper by aliasing the original method and then invoking it, adding my own key to the mix. 

I would only recommend approach #4 when one doesn't have access to or doesn't want to modify some code that uses the function for which a substitute implementation is desired. 

There is one main drawback though. Any other code that relies on the original implementation will break (it won't be possible for only some components to use the new implementation). 

Side note:

Another idea I had was to somehow customize the Jasminerice behavior through its Railtie configuration. Although I wasn't able to find a way to do this. If anyone has suggestions, I'd be interested to here. 

That's it for now.

Kevin

Sunday, August 12, 2012

Testing private methods using singleton methods

I generally use rspec for Ruby unit tests. However, at first glance, it seems that good test coverage is in conflict with good encapsulation. Specifically, private methods cannot be directly tested without some additional thought.

One strategy that is widely advocated is to only test private methods indirectly, through their use by public methods. While perfectly valid, this approach may let use cases slip through the cracks. For example, it may not always be feasible to use the public methods to test all the uses of the private method that one may want to test. This is actually the reason for unit tests in the first place, because not all use cases and exceptions can be uncovered by higher-level interaction (i.e., using the application, integration tests, etc.)

Another strategy is to put all the private methods in a module and include them. However, this means your code gets scattered in multiple places, and that module may not have any other use outside of the one class.

Enter instance_eval

With instance_eval, one can directly call the private method as follows:

obj.instance_eval{ my_private_method }.should 

VoilĂ .

Perhaps one rule of thumb is to use instance_eval sparingly, and only in unit tests (never in integration tests). 

For more discussion on this topic (and a debate about whether private methods should be tested), see this StackOverflow post.

Kevin


Tuesday, May 1, 2012

Alternative naming for associations in MongoMapper

MongoMapper is a mature Object Relational Mapper written in Ruby and for use with MongoDB. It roughly follows the conventions used in Rails' ActiveRecord / ActiveModel. As my use of MongoDB has grown more sophisticated, I have had some trouble discovering how to get what I need done with MongoMapper. Fortunately, this is often not a deficiency of MongoMapper, it's more that I wasn't aware of all its features.

In general on my blog, I'll point out a few features that I felt were incompletely explained in the documentation. Here I address how to use alternative naming for associations.

Using non-default naming for associations

By default, associations are named after the class name:

class Plum
   include MongoMapper::Document
   one :pit
end
class Pit
  include MongoMapper::EmbeddedDocument
end


Running the following to create a Plum with a Pit:


p = Plum.new
p.build_pit
p.save


will result in a document in mongodb like this:

{
   "_id" : ObjectId("4f9ff50ba0b7f9b71500004a"),
   "pit" : {
      "_id" : ObjectId("4f9ff51ba0b7f9b71500004b")
   }
}

However, what if you want to have the class called Pit, but have the association name called dapit? Well, you can use the :class option when declaring the association:


class Pit; end

class Plum
  include MongoMapper::Document

  one :dapit, :class => Pit 
end

class Pit
  include MongoMapper::EmbeddedDocument
  embedded_in :plum
end


Notice that in the above, I needed to declare the class of the embedded document ahead of time, so that I could refer to it in the class definition of Plum.
With the above associations, you can do the following:


p = Plum.new
p.build_dapit
p.dapit.class.name
# => "Pit"


You can do the same with a non-embedded association:


class Pit; end

class Plum
  include MongoMapper::Document

  one :dapit, :class => Pit
end

class Pit
  include MongoMapper::Document
  belongs_to :plum
end


With this version you can do the following:


p = Plum.new
p.create_dapit
p.save
p.dapit.plum

However, what if you want to rename the belongs_to association back to plum? This is a bit more complicated. You need to also specify the foreign_key so that both directions of the association use the same key name for the ObjectID:


class Pit; end

class Plum
  include MongoMapper::Document

  one :dapit, :class => Pit, :foreign_key => :daplum_id
end

class Pit
  include MongoMapper::Document
  belongs_to :daplum, :class => Plum
end


You might wonder, why do I specify the foreign key daplum_id when I'm declaring the association to dapit. Well, for one associations, the ObjectID of the referring Plum is stored in the pit collection. Then when the pit is requested, MongoMapper find the pit document that points back to the plum. So the key name that needs to be changed to match da new naming convention is plum_id -> daplum_id and it's stored in the pit collection, but this renaming is declared in the Plum class, not the Pit class.

You can use the above version as follows:


p = Plum.new
p.create_dapit
p.save
p.dapit.daplum

And the document for a pit now looks like this, using the alternatively named key:

{
   "_id" : ObjectId("4fa0002aa0b7f9653800002b"),
   "daplum_id" : ObjectId("4fa00026a0b7f9653800002a")
}

Hope that all makes sense. 

Saturday, April 28, 2012

Are many-to-many associations in MongoDB fast enough?



MongoDB has a lot of potential. It's schema-less, dynamically typed, distributed, replicated, fast, has a rich query language, broadly supported, and the list goes on.

One thing I found hard to get my mind around is when to denormalize and when to keep things normalized. The major reason to keep documents denormalized is because there is no JOIN operation, as there is in SQL. This means that it's often faster to store the associated data in the parent document, rather than join it in on demand.

However, there are circumstances in which it's not desirable to denormalize things. For example, if the associated data is not unique to single document or collection, then denormalization will require replicating data, and it would be difficult to keep all copies in sync.

I recently ran into such a situation and decided to do some benchmarking to see how slow such queries were. For the purposes of discussion, I have devised the following example layout that is identical to my original problem, but easier to explain. 

Suppose you have sets of items, and each item in each set has ratings by various people. Without JOINs, I was worried about whether it would be slow to query the ratings for all the items of a set. Here I compare two means of querying such an arrangement.

I use the same data in each case, and the end result is the same. There is a collection of items. There is a collection of sets of items. Each set is a document that contains an array of ObjectIDs for items. Then there is a collection of ratings. Each rating is a document listing the person who gave the rating, the item the rating is for, and the rating itself. The question is how fast are various ways to query all the ratings for items in a particular set. In one case, individually.js, I query the ratings collection once for each item in each set. In the other case, I issue one query per set, providing an array, and using the $in operator to get all the ratings for the set of items. While the latter is certainly fewer queries, either way needs to find all the items. Will the one with fewer (but more complicated) queries be faster?

I have put my code into this Gist.

To run the benchmark, run the setup.js script using the Mongo console:

mongo < setup.js

This will use/create a database, benchmarking.

The critical difference between the two test cases lies in how the ratings for a set are queried. This nested pair of for loops queries them one by one.


// Find user 1's ratings for all items in all sets
cur.forEach(function(set) {
  set.items.forEach(function(item_id) {
    var rating = db.ratings.findOne({user: 1, item_id: item_id});
    sum += rating.rating;
  });
});


However, there is an alternative, which is to have a single query that provides all the item_ids that are needed for a given set, as an array, using the $in operator to retrieve those ratings:


// Find user 1's ratings for all items in all sets
cur.forEach(function(set) {
  var res = db.ratings.find({user: 1, item_id: {$in: set.items}});
  res.forEach(function(rating) { sum += rating.rating; });
});


The two benchmark test cases can be run as follows, using the time function to keep track of elapsed runtime:

time mongo < individually.js

And:

time mongo < using_in.js

Benchmarking results:

Individually fetching each rating: 1.76s
Using the $in operator: 0.31s

This is on my rather old macpro. 

Thus, for associations like this, $in is clearly the better choice, performing nearly 6x faster. Also in absolute terms, the query is not slow. Each set gets the 20 associated ratings in less than 1ms.