Impedance mismatch problem is a widely known one, with many tools trying to solve it the wrong way. It’s so famous with so many failed attempts, that it got recognized as the Vietnam war of Computere Science.
Before ORM tools became popular, applications were usually built by first modeling the database, and then creating queries and mapping those queries to objects. Multiple queries per object was immediately recognized as a problem without an adequate solution. While this widely recognized problem was correct for some databases, more advanced could work around this problem. Even today, these workarounds are not mainstream knowledge and the multiple queries problem is often used as an excuse for SQL alternatives.
As ORM tools matured developers started building applications by modeling objects and modeling database to match those objects. Further improvements to ORM tooling also allowed creation of database tables and sometimes even simple migrations.
Unfortunately, ORM tools promised database independence; this resulted in features that could only be implemented in all databases and could not cover non-trivial scenarios. At the same time, more advanced databases (such as Postgres) which focused on features instead of popularity were not fully utilized and looked bad on trivial benchmarks.
ORM tools created all kind of problems by trying to map objects to databases in this way, but lets not reiterate those problems here.
So what went wrong and how to actually solve the Impedance mismatch problem?
To cross the chasm, objects must be modeled in the database too. The problem is that this is not available in databases such as MSSQL or MySQL. This makes it a no go for most ORM tools, since they want to support all databases.
But can this work in real world as well? Well, since this is advanced database territory, it’s not really without issues. For example, managing objects in Oracle is a real pain. And behaviors are not always obvious – let’s take a look at this flat out wrong behavior in Oracle:
SELECT VALUE(main), VALUE(optional) FROM source1 main LEFT JOIN source2 optional ON main.ID = optional.mainID
Guess what’s the VALUE(optional) if the left join fails to find a match? Well, in Oracle it’s an instance of object source2 with all fields null. We can use a workaround with CASE WHEN to work around this issue, but unfortunately it’s not the only one. Fortunately Postgres behaves as expected and doesn’t suffer from this particular issue.
Probably the biggest problem is the schema evolution. Since object oriented features are not very popular, managing objects has all kind of issues. Changing type is not a simple ALTER TYPE operation. Dependency issues are probably the most annoying ones. When you start using objects in a table column you can’t just change them as easily as before.
If managing objects in the database is such a pain, is it really worth it?
Well, it’s worth as much as you want to minimize the mismatch.
Applications are not about classes or tables, they are about models. Domain model is the most important thing in an application. Relations and classes are just a representation of that model in used technology.
So, how to solve this problem?
By moving up the ladder of abstraction. The same way as we started to use C instead of assembly, we should start abstracting the model and going down the ladder of abstraction only when necessary. If database and classes can be created from some model this solves all kind of issues. NoSQL argues about improved developer productivity since they can reason about their model more easily and have less code to maintain.
This, while hard, is a solvable problem – but it can’t be solved in a mainstream way. Mainstream ORMs which offer similar features do it through templates. Classes generated from the model should never, ever, be modified by hand. Unfortunately, nontrivial applications can’t be easily expressed in a template.
What about databases, since ORMs are known for horrible SQL queries?
The problem with horrible SQL queries is a programmer/framework one. A bad SQL query is usually result of a missing feature. Extendable frameworks you to solve this easily, just plug-in this use case until the root cause has been fixed.
Criticism of widely accepted viewpoints
1) The object-to-table mapping problem
While it’s not widely known that any object can be mapped to a row in a table and NoSQL solutions are exactly about that, this is usually not the best approach to every mapping. Sometimes a single object should be mapped to a single row, but most of the time it should span several tables and this distinction depends entirely on the model. The solution is to use SQL databases which support this feature.
2) The Schema-Ownership Conflict
This is not an actual technical issue and since the model should be shared across all “departments” it should reflect both developers and DBAs viewpoints. Also, this is actually a framework/tooling problem. If inadequate tooling is creating problems for some department, sometime they will take measures as drastic as entirely forbidding tooling. Solution to this is to stop using bad tooling. If some framework doesn’t support even the most basic database features such as bulk operations, stop using it.
3) The Dual-Schema Problem
Moving up an abstraction this is not a problem anymore. Both database and classes are representation of a single model. Freezing the model in code or database is a result of inadequate abstraction. If renaming a field is risky because it may break something it’s obvious that there is something wrong with the development process.
4) Entity Identity issues
Concurrency is hard, deal with it. The best way to deal with it is to rely on databases for transactions. MVCC doesn’t suffer from many problems found in non-MVCC databases. Cache invalidation is one of the hardest computer science problems. LISTEN/NOTIFY and Advanced Queuing goes a long way to help with cache invalidation. Again, this is an inadequate database and/or a bad framework problem.
5) The Data Retrieval Mechanism Concern
Oracle and Postgres support collections of objects. This allows them to provide complex aggregates in a single query.
LINQ drastically simplifies interaction with the database from code. Type safety and code familiarity are powerful tools. Manual SQL is always an option, but the main goal is mapping to objects.
Languages which do not support LINQ can, in best case, fall back to method chaining but that leaves a lot to be desired.
6) The Partial-Object Problem and the Load-Time Paradox
Domain-Driven Design provides an excellent modeling tool. Understanding the model provides insight about interactions with objects. When cache understands the model all kind of issues are mitigated. The problem is that for cache to be able to understand the model, either the developer has to specify a lot of information or there must be a higher abstraction which understands the model and know when to invalidate the cache.
E.g. When views are expressed in a model they are there for a business reason. Unfortunately, views are rarely used in ORM tooling since they are really complicated to manage.
Object-relational databases offer ways to bridge the Impedance mismatch problem. But they alone are not enough.
Domain-Driven Design provides language for better modeling. But it alone is not enough.
Combination of a DSL for describing the domain model using DDD concepts and compilers which can maintain application components is a viable solution to solving the Impedance mismatch problem.