Comments on: To the Ravening Hordes: Thank You! http://laurentszyster.be/blog/to-the-ravening-hordes/ Python on Peers Fri, 18 May 2012 14:56:12 +0000 http://wordpress.org/?v=1.5.1.3 by: Laurent Szyster http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-212 Sat, 11 Feb 2006 01:42:51 +0000 http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-212 Multiple Python implementations? It took me four days to take that one. Why on earth would you want multiple implementations of the CPython VM? Why? For the sake of code base fragmentation and the certainty of unpredictable problems of interroperability and portability? The reason Linux and free software is *so* successfull is that it's allways the *same* implementation, the same sources. CPython is CPython is CPython. No matter what Linux, Windows or SchmOS, that's why Python is *so* pervasive. Microsoft managers are not porting IronPython to their VM out of pure joy, they are doing it because C# can't compete against Python. *They* learned the Linux lesson: they coopted Python quitely instead of fighting it openly. Apple did that with GNU, its sales rocket ... and CPython runs on an iPod. Yes *that* CPython. As a general purpose portable VM, PyPy is like pie in the skye: it is much practical to port the C sources. Stackless and a Psyco are fine piece of high technologies, customized Python VM for demanding applications like massive MUDs or number crunching. But CPython is not Lisp and it is not Java. It is made to prototype, test and apply C libraries. No more. No less. And it excells at it. Multiple Python implementations?

It took me four days to take that one. Why on earth would you want multiple implementations of the CPython VM?

Why? For the sake of code base fragmentation and the certainty of unpredictable problems of interroperability and portability?

The reason Linux and free software is *so* successfull is that it’s allways the *same* implementation, the same sources.

CPython is CPython is CPython.

No matter what Linux, Windows or SchmOS, that’s why Python is *so* pervasive. Microsoft managers are not porting IronPython to their VM out of pure joy, they are doing it because C# can’t compete against Python.

*They* learned the Linux lesson: they coopted Python quitely instead of fighting it openly. Apple did that with GNU, its sales rocket … and CPython runs on an iPod.

Yes *that* CPython.

As a general purpose portable VM, PyPy is like pie in the skye: it is much practical to port the C sources. Stackless and a Psyco are fine piece of high technologies, customized Python VM for demanding applications like massive MUDs or number crunching.

But CPython is not Lisp and it is not Java.

It is made to prototype, test and apply C libraries.

No more. No less.

And it excells at it.

]]>
by: Glyph Lefkowitz http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-209 Tue, 07 Feb 2006 03:50:13 +0000 http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-209 No disrespect intended towards Allegra either; my comments are meant to explain the lessons I've learned from experience. To quote Clay Shirky: """ Learning from experience is the worst possible way to learn something. Learning from experience is one up from remembering. That's not great. The best way to learn something is when someone else figures it out and tells you: "Don't go in that swamp. There are alligators in there." """ Many of the problems I raised were hypothetical. Trust me, though, figuring out why a cycle is happening is often not as easy as seeing objects in gc.garbage - there are quite a few cycle-debugging tools in twisted.python.reflect, and sometimes even they aren't adequate. However, I have heard soft noises from the PyPy and python-dev folks that if the PyPy concept is proven a goodly number of the original Python team may move over to mostly working on PyPy instead. Guido, as you have noted, is not terribly interested in crazy interpreter hacks to improve speed, but to approach the widest possible audience, Python will need such crazy hacks (and Armen Rigo is _very_ interested in them). I believe Guido would rather be working on language design issues with a self-hosting interpreter than squeezing the last 2% of performance out of a switch statement (which the CPython team has done admirably over the last few releases - Python's speed currently beats ruby and perl, and that was definitely not true of 1.5.2). Don't misinterpret me - I am not saying Guido has said that CPython is toast as soon as PyPy is ready. It may be a few more years before it is really ready, and If it were ready *today* I believe it would be a decade before CPython were visibly lacking for maintainership. However, if PyPy becomes 50x faster than Python (which I believe is possible given the techniques they're working on) then the micro-optimizations that you've made using finalizers are going to be a lot less substantial than the benefits of supporting multiple Python implementations. No disrespect intended towards Allegra either; my comments are meant to explain the lessons I’ve learned from experience. To quote Clay Shirky:

“”"
Learning from experience is the worst possible way to learn something. Learning from experience is one up from remembering. That’s not great. The best way to learn something is when someone else figures it out and tells you: “Don’t go in that swamp. There are alligators in there.”
“”"

Many of the problems I raised were hypothetical. Trust me, though, figuring out why a cycle is happening is often not as easy as seeing objects in gc.garbage - there are quite a few cycle-debugging tools in twisted.python.reflect, and sometimes even they aren’t adequate.

However, I have heard soft noises from the PyPy and python-dev folks that if the PyPy concept is proven a goodly number of the original Python team may move over to mostly working on PyPy instead. Guido, as you have noted, is not terribly interested in crazy interpreter hacks to improve speed, but to approach the widest possible audience, Python will need such crazy hacks (and Armen Rigo is _very_ interested in them). I believe Guido would rather be working on language design issues with a self-hosting interpreter than squeezing the last 2% of performance out of a switch statement (which the CPython team has done admirably over the last few releases - Python’s speed currently beats ruby and perl, and that was definitely not true of 1.5.2).

Don’t misinterpret me - I am not saying Guido has said that CPython is toast as soon as PyPy is ready. It may be a few more years before it is really ready, and If it were ready *today* I believe it would be a decade before CPython were visibly lacking for maintainership. However, if PyPy becomes 50x faster than Python (which I believe is possible given the techniques they’re working on) then the micro-optimizations that you’ve made using finalizers are going to be a lot less substantial than the benefits of supporting multiple Python implementations.

]]>
by: Laurent Szyster http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-208 Tue, 07 Feb 2006 02:07:26 +0000 http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-208 Errata: I found the edit-commend function on this blog ;-) Errata: I found the edit-commend function on this blog ;-)

]]>
by: Laurent Szyster http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-207 Tue, 07 Feb 2006 02:03:58 +0000 http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-207 Hi Glyph, Thanks for that long comment. Here's my short answers, out of order: This is Python. No jwords allowed here ;-) Python programming happens at the prompt, it's a one cycle: write, test, read, We all know that what you don't test does not work. That's why stable and tested sources that change seldom are often prefered over unstable or alpha sources. Allegra is writen in layers of tested implementations that apply others. For instance, sync_stdio.py is the first application and test of select_trigger.py and thread_loop.py, or http_client.py is the second test of http_reactor.py, tcp_client.py and dns_client.py. Cycles Not Hard To Debug Catching finalization's cycle is damn easy. Let's set the boilerplate: >>> from allegra import loginfo, async_loop, finalization >>> class Continuation (finalization.Finalization): ... def __init__ (self, label): self.label = label ... def __call__ (self, finalized): loginfo.log (self.label) ... def __repr__ (self): return self.label ... ... dispatch a cycle ... >>> a = finalization.Finalization () >>> b = Continuation ('b') >>> c = Continuation ('c') >>> a.finalization = b >>> b.finalization = c >>> c.b = b >>> del a, b, c >>> async_loop.dispatch () debug async_dispatch_start <allegra .finalization.Finalization object at 0x009E2470> debug async_dispatch_stop ... and collect it ... >>> import gc >>> gc.collect () 4 >>> gc.garbage [b, c] Can you think something simpler? Here they are nicely ordered. The first finalization was logged, the two cycling continuations where not collected but are referenced as garbage. CPython Only, So What? Finalizations certainly don't work with IronPython yet: http://blogs.msdn.com/cbrumme/archive/2004/02/20/77460.aspx but CPython got it right. I'm sure that nobody can force the BDFL to do "something insane (...) to improve performance". Finalization worked with 2.2, they also work with 2.4, they will most probably still work with 3.0. No Harm Intended Now let's make this clear and public. I never intended to harm Twisted with a scavenged library. It's just that I'm too lazy and could not move away from a familiar Medusa design, enhanced by five years of ... hacking. Feel free to use it under the GPL 2.0. And why not, contribute? I'm sure there are plenty of Twisted Minions grinding their fantastic testing tooth to tear apart my misunderstanding of socket programming (no irony intended, alas). Kind regards, Hi Glyph,

Thanks for that long comment. Here’s my short answers, out of order:

This is Python.

No jwords allowed here ;-)

Python programming happens at the prompt, it’s a one cycle:

write, test, read,

We all know that what you don’t test does not work. That’s why stable and tested sources that change seldom are often prefered over unstable or alpha sources. Allegra is writen in layers of tested implementations that apply others. For instance, sync_stdio.py is the first application and test of select_trigger.py and thread_loop.py, or http_client.py is the second test of http_reactor.py, tcp_client.py and dns_client.py.

Cycles Not Hard To Debug

Catching finalization’s cycle is damn easy. Let’s set the boilerplate:

>>> from allegra import loginfo, async_loop, finalization
>>> class Continuation (finalization.Finalization):
… def __init__ (self, label): self.label = label
… def __call__ (self, finalized): loginfo.log (self.label)
… def __repr__ (self): return self.label

… dispatch a cycle …

>>> a = finalization.Finalization ()
>>> b = Continuation (’b')
>>> c = Continuation (’c')
>>> a.finalization = b
>>> b.finalization = c
>>> c.b = b
>>> del a, b, c
>>> async_loop.dispatch ()
debug
async_dispatch_start

<allegra .finalization.Finalization object at 0×009E2470>
debug
async_dispatch_stop

… and collect it …

>>> import gc
>>> gc.collect ()
4
>>> gc.garbage
[b, c]

Can you think something simpler? Here they are nicely ordered. The first finalization was logged, the two cycling continuations where not collected but are referenced as garbage.

CPython Only, So What?

Finalizations certainly don’t work with IronPython yet:

http://blogs.msdn.com/cbrumme/archive/2004/02/20/77460.aspx

but CPython got it right.

I’m sure that nobody can force the BDFL to do “something insane (…) to improve performance”. Finalization worked with 2.2, they also work with 2.4, they will most probably still work with 3.0.

No Harm Intended

Now let’s make this clear and public. I never intended to harm Twisted with a scavenged library. It’s just that I’m too lazy and could not move away from a familiar Medusa design, enhanced by five years of … hacking.

Feel free to use it under the GPL 2.0.

And why not, contribute?

I’m sure there are plenty of Twisted Minions grinding their fantastic testing tooth to tear apart my misunderstanding of socket programming (no irony intended, alas).

Kind regards,

]]>
by: Glyph Lefkowitz http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-206 Tue, 07 Feb 2006 00:30:35 +0000 http://laurentszyster.be/blog/to-the-ravening-hordes/#comment-206 I have thought long and hard about this, and I think I understand why you believe it's related to the functionality of Deferreds. I think that they are fundamentally different, and I believe there are at least a few reasons your finalization module is a bad idea. Cycles are a problem. You ask "is it that hard to code without cycles?" In fact, it is. That's why Python added the GC in the first place, because in reality, it is a problem pretty frequently. Of course, it never seems like it, because few objects define __del__ so you never *notice* when you create a cycle unless you always run your code under a memory profiler, with a debug build of Python and a slew of crazy options turned on. There are some problems which are notoriously difficult to solve without cycles, and some cycles which are extremely difficult to find using cleanup code. It will be maddening to debug when your finalization module does fail, because it will only be on extremely complex object graphs, and when you finally track it down, you will realize "Oh, it's just a cycle. There's nothing I could have done that would have made this bug easier to find." These objects would already be participating in a cycle if it weren't for the fact that async_loop is an implicit reference through a global variable, rather than a proper object that you could have multiples of. The reactor "module" in Twisted has this problem to some degree too, and it has been a constant low-priority task to fix it for years. It is difficult, but for extremely esoteric tasks it would be worth it. It would certainly make testing easier. You aren't really close enough to the GC implementation to make this continue to work in future Python versions. A much more performant version of the GC might end up delaying finalization by a few seconds in order to "take advantage of subtle effects in cache latency on x86" or something insane like that to improve performance, potentially delaying finalization by several seconds. PyPy, Jython, and IronPython may already be in this situation (and their finalizers DEFINITELY don't behave like Python's already). If a version of Python is ever released that breaks Allegra, what can you do? I suppose that you can decide, as you decided not to use the latest and greatest in async networking (Twisted) you will cease to use the latest and greatest in Python versions too, and internally maintain a fork of Python 2.5 until the project changes fundamentally. Of course, it's always possible that Allegra will become hugely popular and, because breaking it breaks too much Allegra-using Python code, the Python core team will revert the changes that dramatically improve performance for everyone else because it breaks your insane hack. It doesn't strike me that this is a particularly better outcome. Testing such code is hard. If you had more tests, you may have run into this problem already: your test may want to assert things about the object which needs to be finalized in order to trigger the callbacks. I am saying this from experience - for different reasons, Deferreds have a __del__ method too. I originally made the same case - who would put a Deferred into a cycle, they are one-off objects, etc. It turns out that users did put Deferreds into cycles though, all the time, and the finalizing implementation had to be adjusted to account for that. Finally, if you do want to persist with this, weakref callbacks may have slightly fewer problems than overriding __del__, as they are more recent and better-integrated into the garbage collector (and ever so slightly less likely to break due to circular refs; although this is still a problem). I have thought long and hard about this, and I think I understand why you believe it’s related to the functionality of Deferreds. I think that they are fundamentally different, and I believe there are at least a few reasons your finalization module is a bad idea.

Cycles are a problem. You ask “is it that hard to code without cycles?” In fact, it is. That’s why Python added the GC in the first place, because in reality, it is a problem pretty frequently. Of course, it never seems like it, because few objects define __del__ so you never *notice* when you create a cycle unless you always run your code under a memory profiler, with a debug build of Python and a slew of crazy options turned on. There are some problems which are notoriously difficult to solve without cycles, and some cycles which are extremely difficult to find using cleanup code. It will be maddening to debug when your finalization module does fail, because it will only be on extremely complex object graphs, and when you finally track it down, you will realize “Oh, it’s just a cycle. There’s nothing I could have done that would have made this bug easier to find.”

These objects would already be participating in a cycle if it weren’t for the fact that async_loop is an implicit reference through a global variable, rather than a proper object that you could have multiples of. The reactor “module” in Twisted has this problem to some degree too, and it has been a constant low-priority task to fix it for years. It is difficult, but for extremely esoteric tasks it would be worth it. It would certainly make testing easier.

You aren’t really close enough to the GC implementation to make this continue to work in future Python versions. A much more performant version of the GC might end up delaying finalization by a few seconds in order to “take advantage of subtle effects in cache latency on x86″ or something insane like that to improve performance, potentially delaying finalization by several seconds. PyPy, Jython, and IronPython may already be in this situation (and their finalizers DEFINITELY don’t behave like Python’s already). If a version of Python is ever released that breaks Allegra, what can you do? I suppose that you can decide, as you decided not to use the latest and greatest in async networking (Twisted) you will cease to use the latest and greatest in Python versions too, and internally maintain a fork of Python 2.5 until the project changes fundamentally.

Of course, it’s always possible that Allegra will become hugely popular and, because breaking it breaks too much Allegra-using Python code, the Python core team will revert the changes that dramatically improve performance for everyone else because it breaks your insane hack. It doesn’t strike me that this is a particularly better outcome.

Testing such code is hard. If you had more tests, you may have run into this problem already: your test may want to assert things about the object which needs to be finalized in order to trigger the callbacks. I am saying this from experience - for different reasons, Deferreds have a __del__ method too. I originally made the same case - who would put a Deferred into a cycle, they are one-off objects, etc. It turns out that users did put Deferreds into cycles though, all the time, and the finalizing implementation had to be adjusted to account for that.

Finally, if you do want to persist with this, weakref callbacks may have slightly fewer problems than overriding __del__, as they are more recent and better-integrated into the garbage collector (and ever so slightly less likely to break due to circular refs; although this is still a problem).

]]>