How you design your model dataset methods can significantly affect the flexibility of your API for your model classes, as well as the performance. The goal of this guide is to provide an example of how to design your model dataset methods for maximum flexibility and performance.
In general, it is recommended that you have a single method per task for maximum flexibilty. For example, let's say you need to retrieve all albums released in a given year, ordered by number of units sold descending, and only care about the id, name and number of units sold. One way to do this is in your application code (outside the model), you can call the dataset methods directly:
Album. select(:id, :name, :copies_sold). where(:release_year=>params[:year].to_i). order(Sequel.desc(:copies_sold)). all
One issue with this design is that it ties you to your current database schema, and will make it necessary to change your application code if your schema changes. In general, it is better to encapsulate your code into a dataset method (or a class method, but a dataset method is more flexible):
class Album < Sequel::Model dataset_module do def all_albums_released_in_year(year) select(:id, :name, :copies_sold). where(:release_year=>year). order(Sequel.desc(:copies_sold)). all end end end
Then your application code just needs to call your dataset method:
The advantage of this approach is that you can change your schema at any point in the future, and you should only need to change your model code, you should never need to change your application code.
After designing your dataset methods for flexibility, stop. Don't worry about performance until you need to worry about performance. However, assuming you have profiled your application and profiling shows you can benefit from optimizing the above method, you can then consider the performance impact of future design choices.
First, considering that the root cause of the performance issue may not be
at the Sequel level, it may be at
the database itself. Use
EXPLAIN or the equivalent to analyze
the query plan for the query in use, and see if there is something you can
do to optimize it, such as adding an appropriate index.
Second, assuming the performance issue is at the Sequel level, you need to understand that one of the best ways to improve performance in most ruby code is to reduce the number of objects allocated. Here is the above code with comments showing datasets allocated:
def all_albums_released_in_year(year) select(:id, :name, :copies_sold). # new dataset allocated where(:release_year=>year). # new dataset allocated order(Sequel.desc(:copies_sold)). # new dataset allocated all end
Third, you need to understand that Sequel has optimizations specifically
designed to reduce the number of objects allocated, by caching intermediate
data structures. However, these optimizations apply only to frozen
datasets, and can only provide a benefit if related methods are called
multiple times on the same dataset. One easy way to do this is to use
frozen model datasets is to use the
extension before creating your model class:
Note that freezing datasets is not enough to enable caching in this case.
The reason for this is that
order can potentially receive arbitrary arguments, and
enabling caching for them could easily lead to unbounded cache size (denial
of service due to memory exhaustion). You need to signal to Sequel that particular arguments to
these methods should be cached, and you can do that by calling methods
dataset_module blocks such as
order. These methods will add dataset methods to the model
that can use caching to optimize performance. Here is an example using
class Album < Sequel::Model dataset_module do select :with_name_and_units, :id, :name, :copies_sold order :by_units_sold, Sequel.desc(:copies_sold) def all_albums_released_in_year(year) with_name_and_units. by_units_sold. where(:release_year=>year). all end end end
Performance aside, this does provide a slightly nicer and more readable internal API, though naming such methods can be problematic.
order here, Sequel expects that the created
dataset methods may be called more than once on the same dataset, and it
knows that the arguments to the underlying
order methods are fixed, so it can cache the resulting
datasets. Let's comment the above example with dataset allocations:
def all_albums_released_in_year(year) with_name_and_units. # cached dataset returned by_units_sold. # cached dataset returned where(:release_year=>year). # new dataset allocated all end
Note that the order of methods here is important. If you instead change the method chain to filter the dataset first, then no caching happens:
def all_albums_released_in_year(year) where(:release_year=>year). # new dataset allocated with_name_and_units. # new dataset allocated by_units_sold. # new dataset allocated all end
This is because any time a new, uncached dataset is returned by a dataset method, all subsequent methods in the method chain cannot benefit from caching.
Usually, when you are designing methods to process data based on user input, the user input affects the rows selected, and not the columns selected or the order in which the rows are returned. Sequel is aware of this and has model dataset methods that specifically take user input (arguments), interpret them as a filter condition and either:
Return all matching rows in an array (
Iterate over all matching rows (
Return first matching row (
Return first column in first matching row, assumes only a single column is selected (
After calling these methods on a cached dataset a number of times (currently 3), Sequel will automatically build an optimized loader, cache it, and use it for future loads. So the above example changes to:
def all_albums_released_in_year(year) with_name_and_units. # cached dataset returned by_units_sold. # cached dataset returned where_all(:release_year=>year) # cached loader used end
This can significantly improve performance, up to 3x for complex method chains that only return a few rows.
So the general advice on designing datasets for performance is:
Use frozen datasets
dataset_modulemethods to create named dataset methods that return cached datasets
If any filtering is to be done, have it done last using
By following this advice, you can significantly increase the performance of your model dataset code.
The best way to further increase performance at the Sequel level is to switch to using prepared statements. This does require more significant changes to the API. Here's an example using prepared statements:
class Album < Sequel::Model ALBUMS_RELEASED_IN_YEAR = select(:id, :name, :copies_sold). where(:release_year=>:$year). order(Sequel.desc(:copies_sold)). prepare(:all, :all_albums_released_in_year) def self.all_albums_released_in_year(year) ALBUMS_RELEASED_IN_YEAR.call(:year=>year) end end
Note that when using prepared statements, you need to use a class method instead of a dataset method, as the SQL for the prepared statement must be fixed for the class. This limits the flexibility of the method, since you can no longer call it on arbitrary datasets on the class.