Comment by sampo

23 days ago

Historically 18 years ago, Pandas started as a project by someone working in finance to use Python instead of Excel, yet be nicer than using just raw Python dicts and Numpy arrays.

For better or worse, like Excel and like the simpler programming languages of old, Pandas lets you overwrite data in place.

Prepare some data

    df_pandas = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [10, 20, 30, 40, 50]})
    df_polars = pl.from_pandas(df_pandas)

And then

    df_pandas.loc[1:3, 'b'] += 1

    df_pandas
       a   b
    0  1  10
    1  2  21
    2  3  31
    3  4  41
    4  5  50

Polars comes from a more modern data engineering philosopy, and data is immutable. In Polars, if you ever wanted to do such a thing, you'd write a pipeline to process and replace the whole column.

    df_polars = df_polars.with_columns(
        pl.when(pl.int_range(0, pl.len()).is_between(1, 3))
        .then(pl.col("b") + 1)
        .otherwise(pl.col("b"))
        .alias("b")
    )

If you are just interactively playing around with your data, and want to do it in Python and not in Excel or R, Pandas might still hit the spot. Or use Polars, and if need be then temporarily convert the data to Pandas or even to a Numpy array, manipulate, and then convert back.

P.S. Polars has an optimization to overwite a single value

    df_polars[4, 'b'] += 5
    df_polars
    ┌─────┬─────┐
    │ a   ┆ b   │
    │ --- ┆ --- │
    │ i64 ┆ i64 │
    ╞═════╪═════╡
    │ 1   ┆ 10  │
    │ 2   ┆ 21  │
    │ 3   ┆ 31  │
    │ 4   ┆ 41  │
    │ 5   ┆ 55  │
    └─────┴─────┘

But as far as I know, it doesn't allow slicing or anything.

7 comments

sampo

richardbachman 23 days ago

`row_index()` was also recently added.

  df.with_columns(pl.col.b + pl.row_index().is_between(1, 3))
  # shape: (5, 2)
  # ┌─────┬─────┐
  # │ a   ┆ b   │
  # │ --- ┆ --- │
  # │ i64 ┆ i64 │
  # ╞═════╪═════╡
  # │ 1   ┆ 10  │
  # │ 2   ┆ 21  │
  # │ 3   ┆ 31  │
  # │ 4   ┆ 41  │
  # │ 5   ┆ 50  │
  # └─────┴─────┘

> Polars has an optimization to overwite a single value

I believe it is just "syntax sugar" for calling `Series.scatter()`[1]

> it doesn't allow slicing

I believe you are correct:

  df_polars[1:3, "b"] += 1
  # TypeError: cannot use "slice(1, 3, None)" for indexing

You can do:

  df_polars[list(range(1, 4)), "b"] += 1

Perhaps nobody has requested slice syntax? It seems like it would be easy to add.

[1]: https://github.com/pola-rs/polars/blob/9079e20ae59f8c75dcce8...

goatlover 23 days ago

The Polars code puts me off as being too verbose and requiring too many steps. I love the broadcasting ability that Pandas gets from Numpy. It's what sceintific computing should look like in my opinon. Maybe R, Julia or some array-based language does it a bit better than Numpy/Pandas, but it's certainly not like the Polars example.

thijsn 23 days ago
Polars is indeed more verbose when coming from pandas, but in my experience it is an advantage for when you're reading that same code after not having touched it for months.
pandas is write-optimized, so you can quickly and powerfully transform your data. Once you're used to it, it allows you to quickly get your work done. But figuring out what is happening in that code after returning to it a while later is a lot harder compared to Polars, which is more read-optimized. This read-optimized API coincidentally allows the engine to perform more optimizations because all implicit knowledge about data must be typed out instead of kept in your head.
- goatlover 23 days ago
  
  I don't agree that more verbose code is necessarily more readable when the shorter code looks like familiar math. All you have to do is learn how operators broadcast across array-like structures, how slicing and filtering works. Perhaps with more complicated examples the shorter code becomes harder to read after months away? Mathematicians are able to handle a lot of compact equations.
  No doubt some of this comes down to preference as to what's considered readable. I never really bought that argument that regular expressions create more problems than they're worth. Perhaps I side on the expressivity end of the readability debate.
  
  1 reply →
thereisnospork 23 days ago

Likewise, I was considering trying Polaris until I saw that example. The pandas example is a good approximation of how I think and want to transform/process data even if it is ugly under the hood. I do occasionally find numpy and pandas annoying wrt when the return a view vs a copy but the cure seems worse than the disease.