Moving data around your code - different approaches
I was thinking about this the other day, and I realized I can't particularly think of a label to attach to this concept. Anyone out there got a name for this? This could be due to my lack of formal computer science training, so I thought I'd ask, at the risk of looking stupid (I'm having an off day, so that's a very real possibility).
If you have a series of functions, say: a(x), which in turn calls b(x), which calls c(x), and so on, and you end up with a faily long call chain, out to say m(x), and you subsequently decide that m(x) needs an additional piece of information that is contained in a local variable in the body of the a(x) function - how do you get it to m(x) ?
- You could simply make it a global variable! That's easy and generally a pretty bad idea.
- You could modify every function signature between a(x) and m(x) to look like b(x, y), c(x, y). That's a lot of work though. And what happens if something *else* needs to get passed around?
- In an object oriented language, you could make x an object, and attach y to it, so that you mostly pass it around just as before.
- Along the same lines, you could make x a hash table, so that you can look arguments up by their name, to have some idea what they are.
- You could just make x a list, and remember what position everything is in. That's not so nice though if you have to revisit the code and you didn't document where everything is, and why it's there.
- Other things? You could make a(x)'s scope visible to m(x), I suppose, although that sounds like it's a fairly involved, and potentially hairy answer.
It seems the answers are basically either 1) add the parameter to all the functions and everything that calls them, or 2) use some kind of composite type to stash the one or more values in.
It seems that this is a reasonable common thing to have to do, so it must have a name. How well does your language handle that kind of refactoring?
Trackbacks
Use the following link to trackback from your own site:
http://journal.dedasys.com/trackbacks?article_id=2199
about 1 hour later:
Dynamically-scoped variables are somewhere between 'global variable' and 'make a(x)'s scope visible to m(x)'.
about 1 hour later:
the "Parameterise from Above" and "Encapsulate Context" patterns are approaches to dealing with this.
http://accu.org/index.php/journals/1432
about {{count}} hours later:
To expand on Ben's a bit cryptic comment:
It seems you are looking at propagating a variable along the call chain (regardless of "where in the program text" --> lexical scoping the functions are).
That's what dynamic scoping was invented for -- shell's variables are dynamically scoped, as Emacs Lisp's. Some languages have both (e.g. Perl: "my" --> lexical, "local" --> dynamic, but also Common Lisp).
about {{count}} hours later:
Yeah, I forgot about dynamic scoping; that would be another way of accomplishing the same thing. Thanks.
about {{count}} hours later:
"In an object oriented language, you could make x an object, and attach y to it, so that you mostly pass it around just as before."
1) You can make this sort of composite data type even in a non object-oriented language (e.g., a struct in C).
2) In an object-oriented language, you can make substantial pieces of your call chain member functions of a single class, with y as member data (i.e., the "make y a global" solution without a lot of the drawbacks).
When y is generated inside of a(x) as a representation of some state that is relevant to both a(x) and m(x), this is often a good sign that capturing the relationship between a and m in a class will be useful.
about {{count}} hours later:
I doubt it has a CS name, as the symptoms seem more of a practical programming issue than something theoretical and computer sciencey. :)
In any case, I'm going to point out that the reason you can't just access the information in m(x) is either (1) because a(x)..l(x) mutate x in some way, causing x to lose this information over time, or (2) that the information is expensive to recompute.
The only way to conclusively prevent (1) from happening is to make everything side effect free, presumably using persistent data structures, which of course has various other implications as well. You'll still have to explicitly hold on to needed old information of course, but this should be vastly easier than a solution using side effects.
(2) on the other hand is actually an optimisation problem - this is where we do go back into CS territory, as you're looking at a storage vs runtime tradeoff. Remembering previous results is just an increase in storage complexity (and hopefully lookup has less runtime penalty than recomputation, or you're making things worse, not better).
Of course, (1) and (2) can crop up in tandem, which is even more horrible to deal with in practice.
My gut feeling is that most cases of where this has happened to me have been me messing up the design of the data structure. Of course, the language and runtime should really be flexible enough that the data structures are easy to adapt. I'd go so far as to say that (Java-style?) OOP tends to work against you in such situations because the functionality is married to the data representation.
about {{count}} hours later:
@Phil - maybe I wasn't entirely clear. No, scratch that, I'm sure I wasn't clear. I wasn't talking about X per se, but about making some other piece of data (Y) available to the m() function. It's not really a big, complex, theoretical problem, but a practical one that creeps up a lot. How does one transport that data with the minimal amount of changes to the code... I was actually thinking about this regarding some Erlang code, so side effects are not really the problem.
about {{count}} hours later:
This is what a Monad is for. Well, one of the things. :) I suspect you'll find most formal CS things cover it from that angle.
http://en.wikipedia.org/wiki/Monad_(functional_programming)#State_monads
about {{count}} hours later:
in functional programming you can us a state mondad and easily extend it ...
https://secure.wikimedia.org/wikipedia/en/wiki/Monad_%28functional_programming%29
about {{count}} hours later:
Just make the "global" value needed a local value in your main program. C and Fortran have data structures for some of these issues but a little care in building your main program usually solves things. A class might be your answer but you should be able to accomplish everything with straight C and value passing. If you need things to be fast, you will do things the Fortran way and pass everyting by reference. Good luck.
about {{count}} hours later:
Please reconsider your stance on global variables. While it definitly is a bad idea to put local data into global variables and advising newbie programers against using global variables as they can be very seductive at first, there is one point where they are actually preferable: For global data.
While you can try to abstract everything into some locallity or object and behave as if you can have multiple of those, most programs have some kind of global data. And putting that in locals or invent objects of which you have exactly one instance might be some interesting challange, but it contradicts every efford to produce clean, undstandable and maintainable code.
For the case where it is not global data, I suggest to just add more parameters. It might look like a bit tedious at first, but that only looks like it. It is a simple robust solution and has the instant advantage of documentating the needs of your functions. If what your code needs changes, there is hardly any use in keeping the interface constant. If your interface does not match the callers or the callees need, it is a bad interface. And nothing produces as bad code as bad interfaces (Greetings to java).
about {{count}} hours later:
Please reconsider your stance on global variables. While it definitly is a bad idea to put local data into global variables and advising newbie programers against using global variables as they can be very seductive at first, there is one point where they are actually preferable: For global data.
While you can try to abstract everything into some locallity or object and behave as if you can have multiple of those, most programs have some kind of global data. And putting that in locals or invent objects of which you have exactly one instance might be some interesting challange, but it contradicts every efford to produce clean, undstandable and maintainable code.
For the case where it is not global data, I suggest to just add more parameters. It might look like a bit tedious at first, but that only looks like it. It is a simple robust solution and has the instant advantage of documentating the needs of your functions. If what your code needs changes, there is hardly any use in keeping the interface constant. If your interface does not match the callers or the callees need, it is a bad interface. And nothing produces as bad code as bad interfaces (Greetings to java).
1 day later:
I'm thinking that if a(), b(), etc are together doing some task one could call "foo", build a struct called a "foo context" and make them all pass a pointer to that struct. This is more efficient than passing multiple variables separately, and you only have to modify three places, put the variable into the struct def, make a() set the new variable, and make m() use the new variable.
Of course YMMV in any given situation.
{{count}} days later:
If you didn't want to refactor your signatures you could look at getting and putting into some kind of DB, or perhaps harnessing STM. The problem with globals is the lack of checks and balances which makes moving to parallel execution hard, with stashing things in a DB, or employing DB techniques for MM, you mitigate that.
{{count}} days later:
it is an unsolved problem.
program text was first thought of as a text - with jumps to increase flexibility.
then came the routines, to reduce redundancy. but still, in the end, the model is still the long text, because every function call could be replaced with the full text of the function.
so there is a lack of imagination to retreat from the "long text" with wich you always will be running into representation problems.
also, functions have this hierarchical notion which is not in every case the best way of interaction and seperation of concerns.
i don't know any answers.