11

A recent article by ycombinator lists a comment with principles of a great programmer.

#7. Good programmer: I optimize code. Better programmer: I structure data. Best programmer: What's the difference?

Acknowledging subjective and contentious concepts - does anyone have a position on what this means? I do, but I'd like to edit this question later with my thoughts so-as not to predispose the answers.

2

8 Answers 8

20

Nine times out of ten, when you structure your code/models well, optimization will become obvious. How many times have you seen a hornets nest and found it totally suboptimal, where upon restructuring it, lots of redundancies became extremely obvious.

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away. - Antoine de Saint-Exupery

A well structured system will be minimal in nature, and due to it's minimal nature it will be optimized because how little there is to it relates directly to how little it does to accomplish it's goal.

Edit: To expound upon the point other's have taken away from this, it's also completely accurate to see the statement as identifying the relation between code and data. That relation is thusly: If you change the structure of your data, you will need to change your code to respect the altered structure. If you wish to optimize your code, chances are you will need to change the structure of your data to make your code capable of handling the data more optimally.

That said, there is a totally separate possibility that was being eluded to here, and that would be that this fellow having relations with YCombinator may be referring to code AS data in the LISP tradition of homoiconicity. It's a stretch to surmise this as the meaning in my mind, but it is YCombinator so I wouldn't rule out that the quote is simply saying LISPers are the "Best Programmer"s.

4
  • 1
    This does not speak to "data" and how 'there is not difference between optimizing code and structuring data'. Optimizing code does not restructure bad data unless this is some kind of self-digesting, turing-complete, machine Commented Oct 8, 2012 at 14:48
  • 1
    @NewAlexandria the model mentioned is the "data". Often, bad code and a bad model go hand in hand. Fixing one entails fixing the other.
    – user40980
    Commented Oct 8, 2012 at 14:51
  • 1
    @NewAlexandria I refer to structuring your models as structuring "data", my point is simply about structuring data/code are synonymous because they're a part of the system as a whole and interdependent. To structure either well will also require changes to the other, is this perhaps more of what you were looking for? I was trying to explain how structure and optimzation are the same, not how code and data are related, perhaps I misunderstood your question if that was the confusing part to you? Commented Oct 8, 2012 at 14:52
  • I think this is the closest to elucidating the correct sense of the topic. I certainly knew how this works, but hoped that someone saw something more profound in the question I cited. Commented Oct 8, 2012 at 16:34
4

I think the author is hinting that any restructuring of the data leads to code restructuring. Therefore, restructuring the data with the goal of optimizing your system will force you to optimize your code as well, prompting the "what's the difference?" response.

Note that an "uber-excellent programmer" may reply to "what's the difference?" that there is some difference left in there: once you venture into optimizing for improved use of the CPU cache, you may keep the layout of your data structures the same, but change the order in which you access them can make a great deal of a difference.

3
  • Interesting take on it, I was under the impression the simile between structure and optimization was the topic of the statement, not the relation between code and data, though you're absolutely right about the relation and that it explains that as well. Feels like picking apart a koan :) Commented Oct 8, 2012 at 14:57
  • Sometimes the data restructure permits code restructure, but I think sometimes when you are done, the new code has very little in common with the old code. Commented Oct 8, 2012 at 16:50
  • OTOH, aligning data for cache line size can have a great impact. ;-p
    – Macke
    Commented Oct 8, 2012 at 19:33
3

Consider the most obvious example of this - "searching for user data is too slow!"

If your user data is not indexed or at least sorted, then restructuring your data will quickly yield increased code performance. If the data is structured properly and you're just iterating through the collection (rather than using the indexes or doing something like a binary search) then modifying the code yields increased code performance.

Programmers are problem solvers. While it is useful to distinguish between algorithms and data structures, they cannot often exist in isolation. The best programmers know this, and don't isolate themselves unnecessarily.

2

I don't agree with the statement mentioned above, well at least without explanation. I see coding is the activity involving the utilization of some data structures. Data structures would generally influence coding. So there is a difference between the two in my opinion.

I think the author should have written the last part as "Best programmer: I optimize both."

There is a great book (at least it was in when published) called: Algorithms+Data Structures = Programs.

1

Optimizing code can sometimes improve speed by a factor of two, and occasionally by a factor of ten or even twenty, but that's about it. That may sound like a lot, and if a 75% of a program's execution time is spent in a five-line routine whose speed easily could be doubled, such an optimization may well be worth making. On the other hand, one's selection of data structures may affect execution speed by many orders of magnitude. A modern hyper-optimized multi-threaded processor running super-optimized code to look up data by key in a 10,000,000-item linear linked list stored in RAM would be slower than a much slower processor running a rather simply-coded nested hash table. Indeed, if one had the data laid out properly, even a 1980's computer fetching data from a hard drive might beat the modern CPU using the inferior data structure.

That having been said, designing efficient data structures often requires more complex trade-offs than optimizing code. For example, in many cases the data structures which allow data to be accessed most efficiently are less efficient to update (sometimes by orders of magnitude) than those which allow fast updates, and those which allow the fastest updates may allow the slowest access. Further, in many cases, data structures which are optimal for large data sets may be comparatively inefficient with small ones. A good programmer should strive to balance those competing factors with the amount of programmer time required to implement and maintain various data structures, and be able to strike a decent balance among them.

1

To articulate my best guess at what the article means, I'll assume an unspoken subtext (which seems to be missing in the article) that any programmer should understand about optimization:

  • optimization comes only after you've got the program up and running correctly:
    • make it run correctly, then make it run fast
    • this principle is the point of Knuth's maxim, "premature optimization is the root of all evil"
  • if and when you've determined that optimization is not premature, you must measure it properly first to determine what actually needs optimizing, and again and again during optimization, to tell what effects your attempts at optimization are having.
    • if your code runs in development, the profiler is your friend in this.
    • if your code runs in production, you must instrument your code, and make friends with your logging system instead.

Now, then: your measurements will tell you where in your code the machine is burning the most cycles. A "good" programmer will focus on optimizing those parts of the code, rather than wasting time optimizing the irrelevant parts.

However, you can often make larger gains by looking at the system as a whole, and finding some way to allow the machine to do less work. Frequently, these changes require reworking the organization of your data; thus, a "better" programmer will find himself structuring data more often than not.

The "best programmer" will have a thorough mental model of how the machine works, a good grounding in algorithm design, and a practical understanding of how they interact. This allows him to consider the system as an integrated whole -- he will see no difference between optimizing the code and the data, because he evaluates them at an architectural level.

0

Data structures drive a lot of things relative to performance. I think that we can look at problems hard and long with a preconceived idea about the ideal data structure, and in this context of thinking, even create proofs (often by induction) of optimality. For example, if we put a sorted list into an array and evaluate things like the cost to insert an element we might decide on average we need to shift 1/2 of the array for each insertion. For each binary search, we can find a matching item (or not) in log n steps.

Alternatively, if we defer our decision about data structure (avoid premature optimization) and study the data coming in and the context where we will use it, how big it is, what latencies occur and which ones matter to users, how much memory we have vs. would use with data representations we know or can devise.

In an area like sorting and searching, there is a lot to know. Truly great programmers have been working on this a long time. Understanding these problems well is useful, and it is a great thing if you know more methods than when you finished undergrad data structures class. Binary trees can provide superior performance for insertions in exchange for higher memory use. Hash tables provide even bigger improvements, but for more memory still. A radix tree and radix sort can carry improvements even further.

Creative structuring of the data can help reframe a problem and open the door to new algorithms that make hard applications faster and sometimes impossible tasks possible.

-1

Best programmer: What's the difference?

Best programmer? No. Lousy programmer. I'm assuming the word "optimization" means those things that programmers typically try to optimize, memory or CPU time. In this sense, optimization goes against the grain of almost every other software metric. Understandability, maintainability, testability, etc.: These all take short shrift when optimization is the goal -- unless what one is trying to optimize is human understandability, maintainability, testability, etc. Not to mention cost. Writing an speed / space optimal algorithm costs considerably more in terms of developer time than does naively coding the algorithm as presented in some text or journal. A lousy programmer doesn't know the difference. A good one does. The best programmer knows how to determine exactly what needs to be optimized and does so judiciously.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.