@o11c

o11c@programming.dev · 11 months ago

ReplaceFile exists to get everyone else’s semantics though?

o11c@programming.dev · 1 year ago

It’s because unicode was really broken, and a lot of the obvious breakage was when people mixed the two. So they did fix some of the obvious breakage, but they left a lot of the subtle breakage (in addition to breaking a lot of existing correct code, and introducing a completely nonsensical bytes class).

o11c@programming.dev · 1 year ago

Python 2 had one mostly-working str class, and a mostly-broken unicode class.

Python 3, for some reason, got rid of the one that mostly worked, leaving no replacement. The closest you can get is to spam surrogateescape everywhere, which is both incorrect and has significant performance cost - and that still leaves several APIs unavailable.

Simply removing str indexing would’ve fixed the common user mistake if that was really desirable. It’s not like unicode indexing is meaningful either, and now large amounts of historical data can no longer be accessed from Python.

o11c@programming.dev · edit-2 1 year ago

Unfortunately both of those are used in common English or computer words. The only letter pairs not used are: bq, bx, cf, cj, dx, fq, fx, fz, hx, jb, jc, jf, jg, jq, jv, jx, jz, kq, kz, mx, px, qc, qd, qg, qh, qj, qk, ql, qm, qn, qp, qq, qr, qt, qv, qx, qy, qz, sx, tx, vb, vc, vf, vj, vm, vq, vw, vx, wq, wx, xj, zx.

Personally I have mappings based on <CR>, and press it twice to get a real newline.

o11c@programming.dev · 1 year ago

The problem is that there’s a severe hole in the ABCs: there is no distinction between “container whose elements are mutable” and “container whose elements and size are mutable”.

(related, there’s no distinction for supporting slice operations or not, e.g. deque)

o11c@programming.dev · 1 year ago

Even logging can sometimes be enough to hide the heisgenbug.

Logging to a file descriptor can sometimes be avoided by logging to memory (which for crash-safety includes the possibility of an mmap’ed file, since the kernel will just take care of them as long as the whole system doesn’t go down). But logging from every thread to a single section of memory can also be problematic (even without mutexes, atomics can be expensive and certainly have side-effects) - sometimes you need a separate per-thread log, and combine in the log-reader tool.

o11c@programming.dev · 1 year ago

I don’t remember the last time I used ctrl-C. It’s always select or "+y.

o11c@programming.dev · 1 year ago

I haven’t managed to break into the JS-adjacent ecosystem, but tooling around Typescript is definitely a major part of the problem:

following a basic tutorial somehow ended up spending multiple seconds just to transpile and run “Hello, World!”.
there are at least 3 different ways of specifying the files and settings you want to use, and some of them will cause others to be ignored entirely, even though it looks like they should be used.
embracing duck typing means many common type errors simply cannot be caught. Also that means dynamic type checks are impossible, even though JS itself supports them (admittedly with oddities, e.g. with string vs String).
there are at least 3 incompatible ways to define and use a “module”, and it’s not clear what’s actually useful or intended to be used, or what the outputs are supposed to be for different environments.

At this point I’m seriously considering writing my own sanelanguage-to-JS transpiler or using some other one (maybe Haxe? but I’m not sure its object model allows full performance tweaking), because I’ve written literally dozens of other languages without this kind of pain.

WASM has its own problems (we shouldn’t be quick to call asm.js obsolete … also, C’s object model is not what people think it is) but that’s another story.

At this point, I’d be happy with some basic code reuse. Have a “generalized fibonacci” module taking 3 inputs, and call it 3 ways: from a web browser on the client side, as a web browser request to server (which is running nodejs), or as a nodejs command-line program. Transpiling one of the callers should not force the others to be transpiled, but if multiple of the callers need to be transpiled at once, it should not typecheck the library internals multiple times. I should also be able to choose whether to produce a “dynamic” library (which can be recompiled later without recompiling the dependencies) or a “static” one (only output a single merged file), and whether to minify.

I’m not sure the TS ecosystem is competent enough to deal with this.

o11c@programming.dev · 1 year ago

and I already explained that Union is a thing.

o11c@programming.dev · 1 year ago

That still doesn’t explain why duck typing is ever a thing beyond “I’m too lazy to write extends BaseClass”. There’s simply no reason to want it.

o11c@programming.dev · 1 year ago

Then - ignoring dunders that have weird rules - what, pray tell, is the point of protocols, other than backward compatibility with historical fragile ducks (at the cost of future backwards compatibility)? Why are people afraid of using real base classes?

The fact that it is possible to subclass a Protocol is useless since you can’t enforce subclassing, which is necessary for maintainable software refactoring, unless it’s a purely internal interface (in which case the Union approach is probably still better).

That PEP link includes broken examples so it’s really not worth much as a reference.

(for that matter, the Sequence interface is also broken in Python, in case you need another historical example of why protocols are a bad idea).

o11c@programming.dev · 1 year ago

chunks: [AtomicPtr>; 64], appears before the explanation of why 64 works, and was confusing at first glance since this is completely different than the previous use of 64, which was arbitrary. I was expecting a variable-size array of fixed-size arrays at first (using something like an rwlock you can copy/grow the internal vector without blocking - if there was a writer, the last reader of the old allocation frees it).

Instead of separate flags, what about a single (fixed-size, if chunks are) atomic bitset? This would increase contention slightly but that only happens briefly during growth, not accesses. Many architectures actually have dedicated atomic bit operations though sadly it’s hard to get compilers to generate them.

The obvious API addition is for a single thread to push several elements at once, which can be done more efficiently.

o11c@programming.dev · 1 year ago

Aside: Note that requests is sloppy there, it should use either raise ... from e to make the cause explicit, or from None to hide it. Default propagation is supposed to imply that the second exception was unexpected.

o11c@programming.dev · 1 year ago

In practice, Protocols are a way to make “superclasses” that you can never add features to (for example, readinto despite being critical for performance is utterly broken in Python). This should normally be avoided at almost all costs, but for some reason people hate real base classes?

If you really want to do something like the original article, where there’s a C-implemented class that you can’t change, you’re best off using a (named) Union of two similar types, not a Protocol.

I suppose they are useful for operator overloading but that’s about it. But I’m not sure if type checkers actually implement that properly anyway; overloading is really nasty in a dynamically-typed language.

o11c@programming.dev · 1 year ago

All of these can be done with raw strings just fine.

For the first pathlib bug case, PATH-like lookup is common, not just for binaries but also data and conf files. If users explicitly request ./foo they will be very upset if your program instead looks at /defaultpath/foo. Also, God forbid you dare pass a Path("./--help") to some program. If you’re using os.path.dirname this works just fine.

For the second pathlib bug case, dir/ is often written so that you’ll cause explicit errors if there’s a file by that name. Also there are programs like rsync where the trailing slash outright changes the meaning of the command. Again, os.path APIs give you the correct result.

For the article mistake, backslash is a perfectly legal character in non-Windows filenames and should not be treated as a directory component separator. Thankfully, pathlib doesn’t make this mistake at least. OTOH, / is reasonable to treat as a directory component separator on Windows (and some native APIs already handle it, though normalization is always a problem).

I also just found that the pathlib.Path constructor ignores extra kwargs. But Python has never bothered much with safety anyway, and this minor compared to the outright bugs the other issues cause.

o11c@programming.dev · 1 year ago

One problem is that Rust doesn’t support tagged unions. enum is regrettably solving a different problem, but since it’s the only hammer we have, it’s abused for this kind of thing. This often leads to having to write match error ... unreachable.

o11c@programming.dev · 1 year ago

The default handling is pretty important.

What I find more interesting are 1. the two-argument form of iter, and 2. the __getitem__ auto-implementation that causes there to be two incompatible definitions of Iterable.

(btw your comments are using accidental formatting; use backticks: __next__)

o11c@programming.dev · 1 year ago

The problem with pathlib is that it normalizes away critical information so can’t be used in many situations.

./path should not be path should not be path/.

Also the article is wrong about “Path('some\\path') becomes some/path on Linux/Mac.”

o11c@programming.dev · 1 year ago

Honestly you probably should think about how to translate them. Python at least rolls its own .mo parser so it can support multiple languages in a single process; it’s much more difficult in C unless you push it to the clients (which requires pushing the parameterization as well).

Non-.pot-based internationalization formats are almost always braindead and should be avoided.

o11c@programming.dev · 1 year ago

Note that by messing with a particular module’s __path__ you can turn it into a “package” that loads from arbitrary directories.