Comment by mpweiher
10 years ago
"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." -- Fred Brooks, The Mythical Man Month (1975)
via "Objects have not failed" -- Guy Steele, http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -Linus Torvalds
Though Guy Steele's idea sounds contentious, that OO encourages "data-first" because code is encapsulated:
Or maybe he is just encouraging OO programmers to think more in this vein?
The tricky part is that smartness of data structures is context-sensitive.
One of the most common design errors in OO systems seems to be building systems that beautifully encapsulate a single object’s state… and then finding that the access patterns you actually need involve multiple objects but it’s impossible to write efficient implementations of those algorithms because the underlying data points are all isolated within separate objects and often not stored in a cache-friendly way either.
Another common design problem seems to be sticking with a single representation of important data even though it’s not a good structure for all of the required access patterns. I’m surprised by how often it does make sense to invest a bit of run time converting even moderately large volumes of data into some alternative or augmented structure, if doing so then sets up a more efficient algorithm for the expensive part of whatever you need to do. However, again it can be difficult to employ such techniques if all your data is hidden away within generic containers of single objects and if the primary tools you have to build your algorithms are generic algorithms operating over those containers and methods on each object that operate on their own data in isolation.
The more programming experience I gain, the less frequently I seem to find single objects the appropriate granularity for data hiding.
Well said.
The exercise of comparmentalising and creating atomic islands of objects that dutifully encapsulate data becomes difficult during reassembly simply because we recreate the need for declarative style of accessing data in an imperative (OO) world. It's ye old object-relational impedance mismatch.
A (relational) data model is a single unit. It has to be seen this way. Creating imperative sub-structures (like encapsulating data into objects) breaks this paradigm with serious consequences when attempting to rejig the object-covered data into an on-demand style architecture. The whole model (database?) must be seen as a single design construct and all operations against the entire model must be sensitive to this notion - even if we access one table at a time. Yes, at specific times we may be interested in the contents of a single table or a few tables joined together declaratively for a particular use case, but the entire data model is a single atomic structure "at rest".
When paradigmatic lines like this are drawn, I side with the world-view that getting the data model "right" first is the way to go.
Fred Brooks and Linus Torvalds speak from experience in the trenches.
This also comes up in relational databases. There might be a nice, canonical way to represent what the data really is (i.e. what it represents). but then access patterns for how it is used mean that a different representation is better (usually, somewhat de-normalized). Fortunately, relational algebra enables this (one of Codd's main motivations).
A programming language is even more about data processing that a database is. But it still seems that data structures/objects represent something. I recently came up with a good way to resolve what that is:
This definition allows for the messy denormalized-like data structures you get when you optimize for performance. It also accounts for elegant algebra-like operators, that can be easily composed to model different computation (like the +, . and * of reg exp).
Have you noticed yet that programming ideologies go around in a circle. Programmers may currently be defending data-first (/data-oriented/...) programming, but it isn't the first time they did so ?
The way I experienced it:
micro-services/small tools "that do one thing well"/... (pro: reasonably generic, con: loads of tools, requires expert users, if one tool out of 30 is not working well things break)
data-first/data-oriented programming (really easy to manipulate data, very, VERY hard to maintain consistency)
database-oriented programming (enforce consistency. Otherwise data-oriented. Works well, then when in data-oriented programming your data would have gone inconsistent, in this paradigm you get errors. Needless to say "every operation errors out" is better than "our data suddenly became crap", but still blocks entire departments/systems unpredictably)
event-driven programming (really easy to make button X do Y) (to some extent built into data-base oriented programming, also available separately. Works well, but gets extremely confusing when programs get larger)
object-oriented programming (massively simplifies the "I have 200 different message types and forgot how they interact" problems of event-driven programming, also provides the consistency of database-oriented programming)
<web browsers start here>
event-driven programming with UI designers (makes event-driven programming and later object oriented event-driven programming accessible for idiots)
declarative object-oriented programming / aspect-oriented programming / J2EE and cousins / "Generating forms from data" / Django
micro-services/"do one thing well" (same disadvantages as during the 80s)
data-first/data-oriented programming (same disadvantages as end-of-80s) * you are here *
How much do you want to bet that servers that enforce data consistency and store it are next ?
So so much truth in this. I've also seen Batch processing with terminals ID move to real time programming and operating systems. Move back to browsers and session ID (IE HTML GET / PUT are the new punch cards)
There is truth to that but when I think about it, I don't see a problem. Old ideas can be good ones.
I'd been well familiar with the Fred Brooks quote, but that rather interesting linked essay by Guy Steele which quotes it was new to me. Thanks!
Algorithms + Data Structures = Programs by Niklaus Wirth - Does not one read this book anymore. It takes both and both must be considered throughout the coding process. Its all math / computing is a matter of encoding and logic. 'A' = 41 hex = 01000001 binary. An array of 3 x 3 is also an array of 1 x 9. - A L Turing