programming languages

This is a very opinionated overview of Wolfgang:

scientific programming and data analysis: use the best available software which often/usually is commercial, we also buy lasers and don't build them ourselves (we could). This is matlab & mathematica, matlab even has a very nice and largely compatible open source clone: GNU octave. Mathematica is simply the best if symbolic manipulations or calculations are needed, there is no comparable open-source program. Further, mathematica's built-in curated datasources (from properties of the elements to corona data) are incredible and can truly bring science forward. For me, a good programming language for data analysis and general programming is not so much about the language but about the tools (IDE, debugging), quality and stability of the code, and most importantly documentation: clear and centralized, best also offline, with examples including output, doesn't require google. Libraries should be curated and organized.

python has none of this: it can be used for basic data analysis, but even for this, the lack of proper documentation makes everything very time-consuming. By language design (dynamically typed) and lack of leadership, there won't ever be a good IDE. It can only somewhat be used because a commercial company offers usable installers. Finally, python is super slow.

  • It reduces productivity: Who has a data cursor (and zoom etc) for plots? 99% don't have it enabled because it's a platform-dependent mess, this reduces the quality of data analysis all over science. Anything else like Origin/plot.ly/… is better! You can explain 100 times how to enable interactive plots, but because it's not default, most won't do it.
  • Many resources and hints, also on stacksexchange, are of far lower quality compared to more professional languages
  • python seems to be the quasi standard as a frontend for awesome C/C++ libraries like machine learning. I just would like to know why, this is crazy, you can't extend anything in any non-trivial way, we become stupid. Check out https://github.com/malmaud/TensorFlow.jl/blob/master/docs/src/why_julia.md
  • try r!

My bottom line is to use matlab/mathematica (we pay for it).

User interface programming: Labview is simply the best for realtime plots etc, but it's only an option if you use windows or mac and have a campus license, it's crazy expensive otherwise. Another reason for labview is to talk to NI hardware, I have seen people wasting months to try to achieve something in python which takes 2 minutes in labview, thanks to the awesome example code library of labview. In addition, there is an excellent forum & dedicated support. Btw, NI DAQ hardware is quite cheap compared to alternatives. Labview is IMO also a nice language itself, if and only if you never make structures larger than 10x10cm, use local variables and not only wires, and use a state machine for all logic. There is no other language where parallel processing including UI interaction is as easy, nice & safe. Regarding user interfaces for data analysis, newer versions of Matlab are awesome.

If you need fast big data arithmetics (like FFT), use optimized libraries

  • You can't beat FFTW (used by matlab, numpy, etc) for FFT with own programs in c/c++ etc., don't underestimate the amount of work and compiler optimizations put into those libraries!
  • This is of course the reason why python/numpy based things are fast, it's c/c++

If you have custom logic and have to go fast, write it in any compiled language, I recommend (all cross-platform and open source):

  • golang because of its simplicity, ease to install and cross-compile. It produces static binaries by default. In contrast to C it has garbage collection making it much safer to use, gc overhead is virtually absent nowadays.
  • rust is (hopefully) the successor of C/C++, also easily produces static libraries. Avoids most security issues of C/C++, those also enable super-easy & safe parallel programming, and avoids gc! Also because of this safety, it is not completely trivial to interface to C/C++ libraries and hardware, but once you have made “unsafe” wrappers, programming is fun again! Note that rust (as golang) is not fully object-oriented and has no classes, but there are good reasons for this choice.
  • kotlin on the JVM because it's developed to work well with the best IDE (intellij), and like java it is often faster than unoptimized C/C++, but it's much nicer than java. Once you know how gradle kts build scripts work, installation of compilers and dependencies happen automatically, on any computer.
  • C/C++ only if you program hardware with c/c++ drivers. Otherwise, it's super dangerous, c++ far too complex, and dependency management is very hard. Golang, rust, and JVM languages are typically not (much) slower than c++, if unoptimized they are often even faster.
  • julia is a newly designed language that is compilable (therefore nearly as fast as c/rust/java/…) but scientifically usable, there is fast progress. Unfortunately, the need for a good IDE has not been taken into account during language design, I think this was a huge mistake. There are many IDE attempts done by writing plugins for atom, vscode, intellij, eclipse; but none are really usable yet. For a dynamically or optionally typed language, the IDE has to be developed and integrated with the compiler, see matlab. Only if there is a sound business model, someone can make a good IDE which enables efficient programming. Unclear governance leads to deprecation of linspace etc: wtf? Anyway, I would love something like this, my experience:
    • use the vscode IDE, far better than juno and the intellij plugin (doesn't work at all 202007)
    • nothing seems to be parallelized by default.
    • Dependencies explode easily, I suggest to add import Pkg; Pkg.add(“StaticArrays”) etc to the top of files, so you can periodically do rm -rf ~/.julia.
    • plotting: probably best export and use something else to plot, several plotting libraries install another anaconda without warning (!). Things might become better in future.
    • very unfortunate, that's really a no-go issue for me: they don't want to get the compile time down by caching, a trivial task takes 80 seconds, ticket closed: https://github.com/JuliaPlots/Plots.jl/issues/2211

Language benchmarks performance comparison: https://github.com/drujensen/fib

If you need to talk very fast or with high bandwidth to hardware and/or native libraries, I usually take the language where demo code is available, do data pre-processing in this language, and write a tcp server and talk to it via a nice language or directly with labview UI. If you need to do more complex things or parallel data processing: rust library calls have very little overhead compared to c (use rust-bindgen), golang around 1.4x, and JVM-based languages >1.5x.

An excellent comparison of language performance for hardware programming: https://github.com/ixy-languages/ixy-languages. Bottom line: complain if a company doesn't provide drivers in rust!

Please drop me a note if you have comments :-)