Comment by NoboruWataya
1 day ago
I don't hear nearly as much about Julia as I used to. A few years ago the view was that it was about to replace Python as the language of choice for data science. Seems like that didn't happen?
1 day ago
I don't hear nearly as much about Julia as I used to. A few years ago the view was that it was about to replace Python as the language of choice for data science. Seems like that didn't happen?
I think the hype has slowed down, but all growth statistics haven't. Personally, I think Julia is the only language where I can implement something like Makie without running into a maintenance nightmare, and with Julia GPU programming is actually fun and high level and composes well, which I miss in most other languages. So, I dont really care about it replacing python or not. I do think for replacing python Julia will need to solve compilation latency, shipping AOT binaries and maybe interpret more of the glue code, which currently introduces quite a lot of compilation overhead without much gains in terms of performance.
I don't know about everyone else, but slow Julia compilation continues to cause me ongoing suffering to this day. I don't think they're ever going to "fix" this. On a standard GitHub Actions Windows worker, installing the public Julia packages I use, precompiling, and compiling the sysimage takes over an hour. That's not an exaggeration. I had to juice the worker up to a custom 4x sized worker to get the wall clock time to something reasonable.
It took me days to get that build to work; doing this compilation once in CI so you don't have to do it on every machine is trickier than it sounds in Julia. The "obvious" way (install packages in Docker, run container on target machine) does not work because Julia wants to see exactly the same machine that it was precompiled on. It ends up precompiling again every time you run the container on other machines. I nearly shed a tear the first time I got Julia not to precompile everything again on a new machine.
R and Python are done in five minutes on the standard worker and it was easy; it's just the amount of time it takes to download and extract the prebuilt binaries. Do that inside a Docker container and it's portable as expected. I maintain Linux and Windows environments for the three languages and Julia causes me the most headaches, by far. I absolutely do not care about the tiny improvement in performance from compiling for my particular microarch; I would opt into prebuilt x86_64 generic binaries if Julia had them. I'm very happy to take R's and Python's prebuilt binaries.
I am very interested in improving the user-experience around precompilation and performance, may I ask why you are creating a sysimage from scratch?
> I would opt into prebuilt x86_64 generic binaries if Julia had them
The environment varial JULIA_CPU_TARGET [1] is what you are looking for, it controls what micro-architecture Julia emits for and supports multi-versioning.
As an example Julia is built with [2]: generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)
[1] https://docs.julialang.org/en/v1/manual/environment-variable...
[2] https://github.com/JuliaCI/julia-buildkite/blob/9c9f7d324c94...
I have a monorepo full of Julia analysis scripts written by different people. I want to run them in a Docker container on ephemeral Linux EC2 instances and on user Windows workstations. I don't want to sit through precompilation of all dependencies whenever a new machine runs a particular version of the Julia project for the first time because it takes a truly remarkable amount of time. For the ephemeral Linux instances running Julia in Docker, that happens on every run. Precompiling at Docker build time doesn't help you; it precompiles everything again when you run that container on a different host computer. R and Python don't work like this; if you install everything during the Docker image build, they will not suddenly trigger a lengthy recompilation when run on a different host machine.
I am intimately familiar with JULIA_CPU_TARGET; it's part of configuring PackageCompiler and I had to spend a fair amount of time figuring it out. Mine is [0]. It's not related to what I was discussing there. I am looking for Julia to operate a package manager service like R's CRAN/Posit PPM or Python's PyPI/Conda that distributes compiled binaries for supported platforms. JuliaHub only distributes source code.
[0] generic;skylake-avx512,clone_all;cascadelake,clone_all;icelake-server,clone_all;sapphirerapids,clone_all;znver4,clone_all;znver2,clone_all
3 replies →
> It took me days to get that build to work; doing this compilation once in CI so you don't have to do it on every machine is trickier than it sounds in Julia
You may be interested in looking into AppBundler. Apart from the full application packaging it also offers ability to make Julia image bundles. While offering sysimage compilation option it also enables to bundle an application via compiled pkgimages which requires less RAM and is much faster to compile.
Versus Python, it seems to fork into the "thinkers" vs "doers" camp. Julia provides a level of abstraction that some people find comforting. I thought I could use it as a sort of open source Matlab for a lot of thinky, 1-based index code I had lying around. It didn't meet my needs. And "spend half an hour waiting for a Jupyter notebook to boot up" is real. Great for some but it's not compatible with the way I work.
Elsewhere someone used the term "janky" and perhaps it's the fact that there are so many incredibly smart people around it that makes it so janky. By way of example, somebody needed to check disk space and the architect told him to shell out to Python.
Remember when LLVM first came out and it got kudos for the quality of its error messages? Well if you miss the old-school 1980s GCC experience the nonsense that eventually comes out of the Julia compiler after an hour will relight that flame.
Want to use greek letters and other symbols that don't appear on your keyboard as variable names? You've found your people.
You should try Pluto.jl over Jupyter for Julia notebooks. It runs all sorts of compilations in the background, as well as handles state much better for Julia.
I think there's quite a bit of "quiet work" going on that isn't very visible. Personally I've been happily using Julia for work everyday for years. When the language was younger there was "big" updates that were news worthy, now that slowed down, but it seems there's a decent number of people using the language for serious work that is just a bit too specialised for general interest. E.g. in recent registered packages there's a simulation of earth, a method to analyse EEG recording, or a method to measure loudness.
https://github.com/NumericalEarth/NumericalEarth.jl
https://github.com/Marco-Congedo/Xloreta.jl
https://github.com/slink/ZwickerLoudness.jl
Julia is great ... if you are willing to work with the Goldilocks zone it provides.
I think what happened is this: Julia got advertised as "Python syntax, C speed" but in practice it turns out to really be "Python syntax, 50% of C speed if you were willing to avoid some semi-well-documented gotchas, where avoiding said gotchas will take some non-trivial effort". Again, great if you are willing to work with it.
I am not saying that the Julia people are responsible for the "Python syntax, C speed" perception as much as that was what the prevalent perception became. And
I have talked to people in computational biology who tried Julia, and they said something or the other similar to "It just wasn't performant enough for me to give up Python," and if you really dig in, what really happened was when new people tried Julia with old mental models, they walked away thinking, "Heh, more MIT hypeware."
well I've been reaching 100% of c Speed Most of the time which feels like an easy effort... I guess it depends on the problem a bit and how used you're to writing optimized, clean Julia code
Polyglot Jet Finding:
https://arxiv.org/abs/2309.17309
This paper in experimental high-energy physics is a good example of why Julia is popular for scientific calculations.
It shows that #julialang is over 100 times faster than Python and even faster than C++.
So, my original comment really boils down to the idea that "public perception has nothing to do with objective stats". To which your response is ... citing a paper at me.
To reiterate, citing studies that show that smoking causes cancer in chain smokers does ... nothing. You are citing studies, but I am not the chain smoker; I am just the guy talking about chain smokers.
One more time, I wish we lived in a world where public perception was swayed by objective studies, but we don't.
Julia is fast, yes, but when a university sys-admin rolls their eyes at hearing its name, you have lost the battle for well and good.
As someone who currently uses dabbles in both. That prediction seems a bit unrealistic. Julia is a fantastic language but it has some trade offs that need to be considered. Probably the most well known is `time to first x`. Julia like Python is used comfortably in notebooks but loading libraries can take a minute, compared to Python where it happens right away. It may lead you to not reach for it when you want to do quick testing of something especially plotting. You can mitigate this somewhat by loading all the libraries you'll ever need at startup (preferably long before you are ready to experiment) but that assumes you already know what libraries you'll need for what you're wanting to try.
What prediction? Maybe I need to rephrase what I said: My prediction is, that if Julia ever wants to have a shot at replacing Python, it absolutely has to solve the first time to first x problem! That's what I mean by shipping fully ahead of time compiled binaries and interpreting more glue code - which both have the potential to solve the first time to x problem.
The prediction I was referring to was the one in the parent comment. (The one I was commenting under)
1 reply →
Nope. Funnily enough no one can agree on why - if you ask five people, you get six answers.
My own take is that Julia didn’t since the two language problem as much as was defeated by it.
Julia didn’t attract the high-level Python data science crowd because of Julia’s latency issue, lack of package ecosystem, and the inconveniences that a high performance compiled language incurs, such as having parametric containers.
The research software engineer crowds didn’t buy in because Julia has no interfaces or automatically checkable behavior, poor static tooling, imprecise semantics which is hard to build abstraction on, and a complex performance model that makes it hard to ensure speed, and is hard to deploy.
So, where they tried to make a language that can span the gap, they succeeded in making a language that works for neither, and which no-one wants.
I like the language. But after having used it for eight years, I find it increasingly hard to argue against the point that it’s better to choose Rust for software engineering and Python for scripting.
Edit: I should say: I used it for eight years because it IS fine for my specific niche: High performance research software engineering. Where I care neither about the convenience of Python, nor need to write truly robust and maintainable code. Where my choice of language was personal and I didn’t need to convince a team of coworkers.
I like your comment.
I made many critical comments about Julia here, on this website, but they mostly boil down to clumsiness of Julia.
Does it solve "two language problem"? Kinda, but through this it is less convenient to use than Python, and harder to reason about performance of the particular piece of code than C. Yes, there is a big chance that idiomatic, straightforward Julia code will run pretty fast, but there is also big chance that it will run unexpectedly slow, and you need to know fair bit to be able to debug this... so kinda like going from Python to C?
Is dynamism and interactivity useful? Immensely, but Julia pays for it with poor AOT support (yes, even with juliac and, still experimental, --trim option).
There is also stuff that I consider unacceptable. Debugger being a separate package you need to download, and it even cannot debug compiled code so it needs its custom interpreter? This piece of crap that is Base.@enum, so anyone that wanting proper enums need to install EnumX.jl? And why the hell StaticArrays.jl even exists as a separate package if Julia puts so much focus on numerical applications?
And then we come to tooling and IDE support. Oh, boy - Julia VSC extension is such a miserable experience, and there is not much else out there.
Julia is incredibly fun to play and tinker with solo, and some stuff from SciML is straight-up awesome, but overall it is wasted potential, killed by thousand cuts.
There is now a Julia plugin for JetBrains IDEs: https://plugins.jetbrains.com/plugin/29356-flexible-julia
> it’s better to choose Rust for software engineering and Python for scripting.
Rust and Python work in some scenarios but not all. Unless the native components are all developed already, you will need a Rust programmer on your team. Rust expertise may be available to organizations of a sufficient size and a narrow focus. In my experience, this sort of arrangement requires a team with dedicated resources.
What I encountered more frequently are attempts to use Python as a full stack language. People are not just using it to implement the scripting frontend interface but also trying to implement the backend that does the heavy processing and the numerical computation. As a developer this is a terrible experience. I'm in the middle of replacing for loops in Python with vectorized numpy code but most of my efficiency gains are erased because I still need Python tuples in the end. Yesterday, I had to consider whether exceptions I throw in Cython code can be caught properly in Python.
Research software engineering is one field where you really do need a full stack language. That kind of software engineering requires high performance with limited resources. Julia with some initial patience does deliver a better developer experience in this scenario than the equivalent Python experience for me partly because I do not need to play vectorizarion games, and I can do everything in one integrated environment.
While, yes, interfaces and static tooling could be better, I do think the situation has gotten better over time. There are interface schemes available and additional static tooling is available.
Julia could deliver a better user experience though. Admittedly, the Python tooling to deploy complex solutions has significantly improved lately via tools such as pixi. Julia tools could deliver generic binary code with packages to ease the initial user experience. Advanced users could re-precompile packages to optimize.
The most promising success I have had with Julia is putting notebook interfaces in front of scientists or deploying web apps. My observation in this scenario is that Julia code can be presented as simple enough for a non-professional programmer to modify themselves. Previously, I have only seen that work well with MATLAB and Python. I have not seen this happen with Rust, and I do not expect that to change.
The other observation is that users appreciate the speed of Julia in two ways. 1. Advanced data visualizations are responsive to changes to data. In Python, I would typically need to precompute or prerender these visualizations. With Julia, this can be done dynamically on the fly. 2. Julia user interfaces respond quickly to user input.
While I think Julia has plenty of room to improve, I do think those improvements are possible. I have also greatly appreciated how much Julia has improved in the past five years.
Ugh, this almost feels like flame-bait. This question invariably leads to a lot of bike-shedding around comments from people who feel strongly about some choices in the Julia language (1-based indexing and what not), and the fact that Julia is still not as polished as some other languages in certain aspects of developer experience.
"Data science" is an extremely broad term, so YMMV. That said, since you asked, Julia has absolutely replaced Python for me. I don't have anything new to add on the benefits of Julia; it's all been said before elsewhere. It's just a question of exactly what kind of stuff you want to do. Most of my recent work is math/algorithms flavored, and Python would be annoyingly verbose/inexpressive while also being substantially slower. Julia also tends to have many more high-quality packages of this kind that I can quickly use / build on.
IMO it just had too many rough edges. Very slow compilation, correctness issues (https://yuri.is/not-julia/), kinda janky tooling (not nearly as bad as pip tbf). Even basic language mistakes like implicit variable declaration and 1-based indexing (in 2012??).
Yes 1-based indexing is a mistake. It leads to significantly less elegant code - especially for generic code - and is no harder to understand than 1-based indexing for people capable of programming. Fight me.
> Yes 1-based indexing is a mistake. It leads to significantly less elegant code - especially for generic code - and is no harder to understand than 1-based indexing for people capable of programming.
Some would argue that 0-based indexing is significantly less elegant for numerical/scientific code, but that depends on whether they come from a MATLAB/Fortran or Python/C(++) background.
A decision was made to target the MATLAB/Fortran (and unhappy? Python/C++) crowd first, thus the choice of 1-based indexing and column-major order, but at the end of the day it's a matter of personal preference.
0-based indexing would have made it easier to reach a larger audience, however.
> and is no harder to understand than 1-based indexing for people capable of programming.
The same could be said the other way around ;-)
The 0 or 1 based indexing is actually a very superficial debate for people not very familiar with Julia. Note that 1-based indexing is a standard library feature not inherent to the Julia language itself.
The real indexing issue is whether arbitrary-base abstraction is too easily available.
Basically, the concrete `Vector` type is 1-based. However, `AbstractVector` is could have an arbitrary first index. OffsetArrays.jl is a non-standard package that provides the ability to create arrays with indexes that can start at an arbitrary point including 0.
Heh. I grew up writing C code and had real trouble adapting to Matlab's 1-based indexing. Much later I tried Python and was constantly confused by 0-based indexing.
I don't think one is better than the other but my mind is currently wired to see indexing with base 1.
Then there's Option Base 1 in VBA if you don't like the default behavior. Perfect for creating subtle off-by-one bugs.
Aside from the fact that 1-based indexing is better for scientific code (see Fortran), I don’t think that it matters very often. I don’t think that any Julia program I’ve ever written would need to change if Julia adopted 0-based tomorrow. You don’t typically write C-style loops in Julia; you use array functions and operators, and if you need to iterate you write `for i in array ...`. If you really need the first or last element you write `a[begin]` or `a[end]`.
12 replies →
lol. There's not much to fight since its a very personal problem how you want to write code. It's evident that all the capable programmers in the Julia community, have found satisfactory ways to get around it, so if you haven't yet, I don't see how that's a Julia problem ;) I can only say I haven't had a single problem with one based indexing in 12 years of developing Julia code. I also haven't run into many correctness issues compared to other languages I've been using. I think Yuri also has been using lots of packages which haven't been very mature. How on earth can you compare a 10 years old library with lots of maintainers with packages created in one year by one person? That's at least what Yuri's critic boils down to me.
I disagree. Julia has correctnes issues because it chose maximum composability over specifying interfaces explicitly. And those are not just in immature packages but also in complex packages. Compared to other languages, Julia has no facilities to help structure large complex code bases. And this also leads to bad error messages and bad documentation.
Recently we got the public keyword, but even the PR there says:
"NOTE: This PR is not a complete solution to the "public interfaces are typically not well specified in Julia" problem. We would need to implement much than this to get to that point. Work on that problem is ongoing in Base and packages and contributions are welcome."
Analogous to “time to first plot”, Julia metacommentary now has time to first “Why I no longer. . .” repost.
I'm even sympathetic to some of the concerns. I say that as someone deeply embedded in the Julia community. but seeing this same repost over and over for years honestly starts to get pretty frustrating.