15

Object orientation has helped me a lot in implementing many algorithms. However, object-oriented languages sometimes guide you in "straightforward" approach and I doubt if this approach is always a good thing.

OO is really helpful in coding algorithms fast and easily. But could this OOP be a disadvantage for software based on performance i.e. how fast does the programm executes?

For example, storing graph nodes in a data structure seems "straightforward" in the first place, but if Node objects contain many attributes and methods, could this lead to a slow algorithm?

In other words, could many references between many different objects, or using many methods from many classes, result in a "heavy" implementation?

11
  • 1
    Quite a strange question. I can understand how OOP helps on a level of architecture. But a level of algorithms implementation is normally built upon abstractions that are very alien to anything OOP stands for. So, chances are, performance is not the biggest problem for your OOP algorithms implementations. As for the performance, with OOP the single biggest bottleneck is normally related to virtual calls.
    – SK-logic
    Commented Dec 17, 2011 at 17:17
  • @SK-logic > object orientation tend to manipulate everithing by pointer, which imply a more important workload on the memory allocation side, and non localized data tend to not be in the CPU cache and, last but not least, imply a lot of indirect branching (virtual functions) which is deadly for CPU pipeline. OO is a good thing, but it can certainly have a performance cost in some cases.
    – deadalnix
    Commented Dec 17, 2011 at 17:45
  • If the nodes in your graph have a hundred attributes, you'll need place to store them regardless of the paradigm used for the actual implementation, and I don't see how any single paradigm has an edge on this in general. @deadalnix: Maybe the constant factores can be worse due to making certain optimizations harder. But note that I say harder, not impossible - for instance, PyPy can unbox objects in tight loops and JVMs have been inlining virtual function calls since forever.
    – user7043
    Commented Dec 17, 2011 at 17:55
  • Python is good for prototyping algorithms, and yet you frequently do not need a class when implementing a typical algorithm in it.
    – Job
    Commented Dec 17, 2011 at 19:20
  • 1
    +1 For relating object orientation with algorithms, something that is overlooked these days, both in software industry, and academy ...
    – umlcat
    Commented Dec 17, 2011 at 20:01

10 Answers 10

21

Object orientation may prevent certain algorithmic optimizations, because of encapsulation. Two algorithms may work particularly well together, but if they are hidden behind OO interfaces, the possibility to use their synergy is lost.

Look at numerical libraries. A lot of them (not only those written in the 60s or 70s) are not OOP. There is a reason for that -- numerical algorithms work better as a set of decoupled modules than as OO hierarchies with interfaces and encapsulation.

7
  • 3
    The main reason for that is that only C++ figured out to use expression templates to make the OO version be as efficient.
    – DeadMG
    Commented Dec 17, 2011 at 17:17
  • 7
    Look at the modern C++ libraries (STL, Boost) - they're not OOP at all too. And not just because of performance. Algorithms could not normally be well represented in an OOP style. Things like generic programming are much better suited for low level algorithms.
    – SK-logic
    Commented Dec 17, 2011 at 17:20
  • 3
    Wha-wha-what? I guess I come from a different planet than quant_dev and SK-logic. No, a different universe. With different laws of physics and everything.
    – Mike Nakis
    Commented Dec 17, 2011 at 22:32
  • 6
    @MikeNakis: the difference in viewpoint lies in (1) whether a certain piece of computational code can benefit in terms of human-readability from OOP at all (which numerical recipes aren't); (2) whether the OOP class design aligns with the optimal data structure and algorithm (see my answer); and (3) whether each layer of indirection deliver sufficient "value" (in terms of work-done per function call or conceptual clarity per layer) justifies the overhead (due to indirection, function call, layers, or data copying). (4) Finally, sophistication of compiler/JIT/optimizer is limiting factor.
    – rwong
    Commented Dec 17, 2011 at 23:36
  • 3
    @MikeNakis, what do you mean? Do you think STL is an OOP library? Generic programming does not go well with OOP anyway. And needless to mention that OOP is a too narrow framework, suitable only for a very few practical tasks, alien for anything else.
    – SK-logic
    Commented Dec 18, 2011 at 14:50
8

That's not really about object orientation as about containers. If you used a double linked list to store pixels in your video player it's going to suffer.

However if you use the correct container there is no reason a std::vector is slower than an array, and since you have all the common algorithms already written for it - by experts - it's probably quicker than your home rolled array code.

1
  • 2
    Because compilers are sub-optimal (or the rules of the programming language forbids taking advantage of certain assumptions or optimizations), there is indeed an overhead which cannot be removed. Also, certain optimizations e.g. vectorization have data organization requirements (e.g. structure-of-arrays instead of array-of-structures) which OOP can either enhance or hinder. (I recently just worked on an std::vector optimization task.)
    – rwong
    Commented Dec 18, 2011 at 1:32
8

What determines performance?

The fundamentals: data structures, algorithms, computer architecture, hardware. Plus overhead.

An OOP program can be designed to align exactly with the choice of data structures and algorithms that is deemed optimal by CS theory. It will have the same performance characteristic as the optimal program, plus some overhead. The overhead can usually be minimized.

However, a program that is initially designed with only OOP concerns, without concerning the fundamentals, may be initially sub-optimal. The sub-optimality is sometimes removable by refactoring; sometimes it is not - requiring a complete rewrite.

Caveat: does performance matter in business software?

Yes, but time-to-market (TTM) is more important, by orders of magnitude. Business software place the emphasis on the adaptability of the code to complex business rules. Performance measurements should be taken throughout the development life cycle. (See section: what does optimal performance mean?) Only marketable enhancements should be made, and should be gradually introduced in later versions.

What does optimal performance mean?

In general, the issue with software performance is that: in order to prove that "a faster version exists", that faster version must come into existence first (i.e. no proof other than itself).

Sometimes that faster version is first seen in a different language or paradigm. This should be taken as a hint to improvement, not a judgment of inferiority of some other languages or paradigms.

Why are we doing OOP if it may hinder our search for optimal performance?

OOP introduces overhead (in space and execution), in return for improving the "workability" and hence the business value of the code. This reduces the cost of further development and optimization. See @MikeNakis.

Which parts of OOP may encourage an initially sub-optimal design?

The parts of OOP that (i) encourages simplicity / intuitiveness, (ii) use of colloquial design methods instead of fundamentals, (iii) discourages multiple tailored implementations of same purpose.

  • KISS
  • YAGNI
  • DRY
  • Object design (e.g. with CRC cards) without giving equal thoughts to fundamentals)

Strict application of some OOP guidelines (encapsulation, message passing, do one thing well) will indeed result in slower code at first. Performance measurements will help diagnose those issues. As long as the data structure and algorithm aligns with the theory-predicted optimal design, overhead can usually be minimized.

What are the common mitigations to OOP overheads?

As previously stated, using data structures that are optimal to the design.

Some languages support code inlining which can recover some runtime performance.

How could we adopt OOP without sacrificing performance?

Learn and apply both the OOP and the fundamentals.

It is true that strict adherence to OOP may prevent you from writing a faster version. Sometimes a faster version can only be written from scratch. This is why it helps to write multiple versions of code using different algorithms and paradigms (OOP, generic, functional, mathematical, spaghetti), and then use optimization tools to make each version approach the observed maximal performance.

Are there types of code that will not benefit from OOP?

(Expanded from the discussion between [@quant_dev], [@SK-logic] and [@MikeNakis])

  1. Numerical recipes, which originate from mathematics.
    • The mathematical equations and transforms themselves can be understood as objects.
    • Very sophisticated code transformation techniques are needed to generate efficient executable code. The naive ("white-board") implementation will have abysmal performance.
    • However, today's mainstream compilers are unable to do so.
    • Specialized software (MATLAB and Mathematica, etc) have both JIT and symbolic solvers able to generate efficient code for some sub-problems. These specialized solvers can be seen as special-purpose compilers (mediators between human-readable code and machine-executable code) which will themselves benefit from an OOP design.
    • Each sub-problem requires its own "compiler" and "code transformations". Therefore, this is a very active open research area with new results appearing every year.
    • Because research takes long time, software writers have to carry out optimization on paper and transcribe the optimized code into software. The transcribed code might indeed be unintelligible.
  2. Very low level code.
      *
1
  • So true that TTM is the highest priority, but the overhead from OOP is so horrible that is increases TTM. In other words, performance isn't a problem until it is, and with OOP... it's a problem. Commented Mar 19, 2020 at 17:48
6

OOP is obviously a good idea, and like any good idea it can be over-used. In my experience it is way over-used. Poor performance and poor maintainability result.

It has nothing to do with the overhead of calling virtual functions, and not much to do with what the optimizer / jitter does.

It has everything to do with data structures that, while having the very best big-O performance, have very bad constant factors. This is done on the assumption that if there is any performance-limiting problem in the app, it is elsewhere.

One way this manifests is the number of times per second new is performed, which is assumed to have O(1) performance, but can execute hundreds to thousands of instructions (including the matching delete or GC time). That can be mitigated by saving used objects, but that makes the code less "clean".

Another way it manifests is the way people are encouraged to write property functions, notification handlers, calls to base class functions, all kinds of subterranean function calls that exist to try to maintain consistency. For maintaining consistency they are of limited success, but they are wildly successful at wasting cycles. Programmers understand the concept of normalized data but they tend to apply it only to database design. They do not apply it to data structure design, at least partly because OOP tells them they don't have to. As simple a thing as setting a Modified bit in an object can result in a tsunami of updates running through the data structure, because no class worth its code takes a Modified call and just stores it.

Maybe the performance of a given app is just fine as written.

On the other hand, if there is a performance problem, here's an example of how I go about tuning it. It's a multi-stage process. At each stage, some particular activity accounts for a large fraction of time and could be replaced by something faster. (I did not say "bottleneck". These are not the kinds of things that profilers are good at finding.) This process often requires, in order to get the speedup, wholesale replacement of data structure. Often that data structure is there only because it's recommended OOP practice.

5

Yes, the object-oriented mindset can definitely be neutral or negative when it comes to high-performance programming, both at the algorithmic and implementation level. If OOP replaces algorithmic analysis, it can lead you into premature implementation and, at the lowest level, the OOP abstractions have to be put aside.

The issue stems from OOP's emphasis on thinking about individual instances. I think it's fair to say that the OOP way of thinking about an algorithm is by thinking about a specific set of values and implementing it that way. If that is your highest-level path, you are unlikely to realize a transformation or restructuring that would lead to Big O gains.

At the algorithmic level, it is often thinking about the bigger picture and the constraints or relationships between values that lead to Big O gains. An example might be that there's nothing in the OOP mindset that would lead you to transform "sum a continuous range of integers" from a loop to (max + min) * n/2

At the implementation level, although computers are "fast enough" for most application-level algorithms, in low-level performance-critical code one worries a great deal about locality. Again, the OOP emphasis on thinking about an individual instance and the values of one pass through the loop can be a negative. In high-performing code, instead of writing a straightforward loop, you might want to partially unroll the loop, group several loading instructions up at the top, then transform them in a group, then write them in a group. All the while you'd be paying attention to intermediate calculations and, hugely, to cache and memory access; issues where the OOP abstractions are no longer valid. And, if followed, can be misleading: at this level, you have to know about and think about the machine-level representations.

When you look at something like Intel's Performance Primitives you have literally thousands of implementations of the Fast Fourier Transform, each one tweaked to work better for a specific data-size and machine architecture. (Fascinatingly, it turns out that the bulk of these implementations are machine-generated: Markus Püschel Automatic Performance Programming)

Of course, as most of the answers have said, for most development, for most algorithms, OOP is irrelevant to performance. As long as you aren't "prematurely pessimizing" and adding in a lot of non-local calls, the this pointer is neither here nor there.

4

But could this OOP be a disadvantage for software based on performance i.e. how fast does the programm executes?

Often Yes!!! BUT...

In other words, could many references between many different objects, or using many methods from many classes, result in a "heavy" implementation?

Not necessarily. This depends on the language/compiler. For example, an optimizing C++ compiler, provided that you don't use virtual functions, will often squash down your object overhead to zero. You can do things like write a wrapper over an int there or a scoped smart pointer over a plain old pointer which performs just as fast as using these plain old data types directly.

In other languages like Java, there is a bit of an overhead to an object (often quite small in many cases, but astronomical in some rare cases with really teeny objects). For example, Integer there is considerably less efficient than int (takes 16 bytes as opposed to 4 on 64-bit). Yet this isn't just blatant waste or anything of that sort. In exchange, Java offers things like reflection on every single user-defined type uniformly, as well as the ability to override any function not marked as final.

Yet let's take the best-case scenario: the optimizing C++ compiler which can optimize object interfaces down to zero overhead. Even then, OOP will often degrade performance and prevent it from reaching the peak. That might sound like a complete paradox: how could it be? The problem lies in:

Interface Design and Encapsulation

The problem is that even when a compiler can squash an object's structure down to zero overhead (which is at least very often true for optimizing C++ compilers), the encapsulation and interface design (and dependencies accumulated) of fine-grained objects will often prevent the most optimal data representations for objects that are intended to be aggregated by the masses (which is often the case for performance-critical software).

Take this example:

class Particle
{
public:
    ...

private:
    double birth;                // 8 bytes
    float x;                     // 4 bytes
    float y;                     // 4 bytes
    float z;                     // 4 bytes
    /*padding*/                  // 4 bytes of padding
};
Particle particles[1000000];     // 1mil particles (~24 megs)

Let's say our memory access pattern is to simply loop through these particles sequentially and move them around each frame repeatedly, bouncing them off the corners of the screen and then rendering the result.

Already we can see a glaring 4 byte padding overhead required to align the birth member properly when particles are aggregated contiguously. Already ~16.7% of the memory is wasted with dead space used for alignment.

This might seem moot because we have gigabytes of DRAM these days. Yet even the most beastly machines we have today often only have a mere 8 megabytes when it comes to the slowest and biggest region of the CPU cache (L3). The less we can fit in there, the more we pay for it in terms of repeated DRAM access, and the slower things get. Suddenly, wasting 16.7% of memory no longer seems like a trivial deal.

We can easily eliminate this overhead without any impact on field alignment:

class Particle
{
public:
    ...

private:
    float x;                     // 4 bytes
    float y;                     // 4 bytes
    float z;                     // 4 bytes
};
Particle particles[1000000];     // 1mil particles (~12 megs)
double particle_birth[1000000];  // 1mil particle births (~8 bytes)

Now we've reduced the memory from 24 megs to 20 megs. With a sequential access pattern, the machine will now consume this data quite a bit faster.

But let's look at this birth field a bit more closely. Let's say it records the starting time when a particle is born (created). Imagine the field is only accessed when a particle is first created, and every 10 seconds to see if a particle should die and become reborn in a random location on the screen. In that case, birth is a cold field. It's not accessed in our performance-critical loops.

As a result, the actual performance-critical data is not 20 megabytes but actually a 12-megabyte contiguous block. The actual hot memory we're accessing frequently has shrunk to half its size! Expect significant speed-ups over our original, 24-megabyte solution (doesn't need to be measured -- already done this kind of stuff a thousand times, but feel free if in doubt).

Yet notice what we did here. We completely broke the encapsulation of this particle object. Its state is now split between a Particle type's private fields and a separate, parallel array. And that's where granular object-oriented design gets in the way.

We can't express the optimal data representation when confined to the interface design of a single, very granular object like a single particle, a single pixel, even a single 4-component vector, possibly even a single "creature" object in a game, etc. A cheetah's speed will be wasted if it's standing on a teeny island that's 2 sq. meters, and that's what very granular object-oriented design often does in terms of performance. It confines the data representation to a sub-optimal nature.

To take this further, let's say that since we're just moving particles around, we can actually access their x/y/z fields in three separate loops. In that case, we can benefit from SoA style SIMD intrinsics with AVX registers that can vectorize 8 SPFP operations in parallel. But to do this, we must now use this representation:

float particle_x[1000000];       // 1mil particle X positions (~4 megs)
float particle_y[1000000];       // 1mil particle Y positions (~4 megs)
float particle_z[1000000];       // 1mil particle Z positions (~4 megs)
double particle_birth[1000000];  // 1mil particle births (~8 bytes)

Now we're flying with the particle simulation, but look what happened to our particle design. It has been completely demolished, and we're now looking at 4 parallel arrays and no object to aggregate them whatsoever. Our object-oriented Particle design has gone sayonara.

This happened to me many times working in performance-critical fields where users demand speed with only correctness being the one thing they demand more. These little teeny object-oriented designs had to be demolished, and the cascading breakages often required that we used a slow deprecation strategy towards the faster design.

Solution

The above scenario only presents a problem with granular object-oriented designs. In those cases, we often end up having to demolish the structure in order to express more efficient representations as a result of SoA reps, hot/cold field splitting, padding reduction for sequential access patterns (padding is sometimes helpful for performance with random-access patterns in AoS cases, but almost always a hindrance for sequential access patterns), etc.

Yet we can take that final representation we settled on and still model an object-oriented interface:

// Represents a collection of particles.
class ParticleSystem
{
public:
    ...

private:
    double particle_birth[1000000];  // 1mil particle births (~8 bytes)
    float particle_x[1000000];       // 1mil particle X positions (~4 megs)
    float particle_y[1000000];       // 1mil particle Y positions (~4 megs)
    float particle_z[1000000];       // 1mil particle Z positions (~4 megs)
};

Now we're good. We can get all the object-oriented goodies we like. The cheetah has a whole country to run across as fast as it can. Our interface designs no longer trap us into a bottleneck corner.

ParticleSystem can potentially even be abstract and use virtual functions. It's moot now, we're paying for the overhead at the collection of particles level instead of at a per-particle level. The overhead is 1/1,000,000th of what it would be otherwise if we were modeling objects at the individual particle level.

So that's the solution in true performance-critical areas that handle a heavy load, and for all kinds of programming languages (this technique benefits C, C++, Python, Java, JavaScript, Lua, Swift, etc). And it can't easily be labeled as "premature optimization", since this relates to interface design and architecture. We can't write a codebase modeling a single particle as an object with a boatload of client dependencies to a Particle's public interface and then change our minds later. I've done that a lot when being called to optimize legacy codebases, and that can end up taking months of rewriting tens of thousands of lines of code carefully to use the bulkier design. This ideally affects how we design things upfront provided that we can anticipate a heavy load.

I keep echoing this answer in some form or another in many performance questions, and especially ones that relate to object-oriented design. Object-oriented design can still be compatible with the highest-demand performance needs, but we have to change the way we think about it a little bit. We have to give that cheetah some room to run as fast as it can, and that's often impossible if we design teeny little objects that barely store any state.

1
  • Fantastic. This is what I was actually looking for in terms of combining OOP with high performance demand. I really can not understand why it's not upvoted more.
    – pbx
    Commented Nov 6, 2018 at 11:10
2

In theory, it could lead to slowness, but even then, it would not be a slow algorithm, it would be a slow implementation. In practice, object orientation will allow you to try various what-if scenarios (or revisit the algorithm in the future) and thus provide algorithmic improvements to it, which you could never hope to achieve if you had written it the spaghetti way in the first place, because the task would be daunting. (You would essentially have to rewrite the whole thing.)

For example, by having divided the various tasks and entities to clean-cut objects, you may be able to easily come in later and, say, embed a caching facility between some objects, (transparent to them,) which could yield a thousand-fold improvement.

Generally, the types of improvements you can achieve by using a low-level language (or clever tricks with a high-level language) give constant (linear) time improvements, which do not figure in terms of big-oh notation. With algorithmic improvements you may be able to achieve non-linear improvements. That's priceless.

2
  • 1
    +1: the difference between spaghetti and object-oriented code (or code written in a well-defined paradigm) is: each version of good code rewritten brings new understanding into the problem. Each version of spaghetti rewritten never brings any insight.
    – rwong
    Commented Dec 18, 2011 at 2:38
  • @rwong couldn't be explain better ;-)
    – umlcat
    Commented Dec 19, 2011 at 15:53
0

Its related, and often overlooked.

Its not an easy answer, its depends on what do you want to do.

Some algorithms are better in performance using plain structured programming, while others, are better using object orientation.

Before Object Orientation, many schools teach (ed) algorithm design with structured programming. Today, many schools, teach object oriented programming, ignoring algorithm design & performance..

Of course, there where schools that teach structured programming, that didn't care about algorithms, at all.

0

Performance all comes down to CPU and memory cycles in the end. But the percentage difference between the overhead of OOP messaging and encapsulation and a more wide open programming semantic may or may not be a significant enough percentage to make a noticeable difference in your application's performance. If an app is disk or data-cache-miss bound, any OOP overhead may be completely lost in the noise.

But, in the inner loops of real-time signal and image processing and other numerical compute bound applications, the difference may well be a significant percentage of CPU and memory cycles, which can make any OOP overhead much more costly to execute.

The semantics of a particular OOP language may or may not expose enough opportunities for the compiler to optimize away those cycles, or for the CPU's branch prediction circuits to always guess correctly and cover those cycles with pre-fetch and pipelining.

0

A good object oriented design helped me to speed up an application considerably. A had to generate complex graphics in an algorithmic way. I did it through Microsoft Visio automation. I worked, but was incredibly slow. Fortunately, I had inserted an extra level of abstraction between the logic (the algorithm) and the Visio stuff. My Visio component did expose its functionality through an interface. This allowed me to easily replace the slow component with another creating SVG-files, that was at least 50 times faster! Without a clean object-oriented approach, the codes for the algorithm and the Vision control would have been entangled in a way, which would have turned the change into a nightmare.

2
  • did you mean O.O. Design applied with a procedural language, or O.O. Design & O.O. programming language ?
    – umlcat
    Commented Dec 20, 2011 at 15:55
  • I am speaking of a C# application. Both the design and the language are O.O. Where as the O.O.-iness of the language will introduce some small performance hits (virtual method calls, object creation, member access via interface), the O.O. design helped me in creating a much faster application. What I want to say is: Forget performance hits due to O.O. (language and design). Unless you are doing heavy calculations with millions of iterations, O.O. will not harm you. Where you usually loose a lot of time is I/O. Commented Dec 20, 2011 at 16:16

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.