← Back to context

Comment by sampo

6 hours ago

Historically 18 years ago, Pandas started as a project by someone working in finance to use Python instead of Excel, yet be nicer than using just raw Python dicts and Numpy arrays.

For better or worse, like Excel and like the simpler programming languages of old, Pandas lets you overwrite data in place.

Prepare some data

    df_pandas = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [10, 20, 30, 40, 50]})
    df_polars = pl.from_pandas(df_pandas)

And then

    df_pandas.loc[1:3, 'b'] += 1

    df_pandas
       a   b
    0  1  10
    1  2  21
    2  3  31
    3  4  41
    4  5  50

Polars comes from a more modern data engineering philosopy, and data is immutable. In Polars, if you ever wanted to do such a thing, you'd write a pipeline to process and replace the whole column.

    df_polars = df_polars.with_columns(
        pl.when(pl.int_range(0, pl.len()).is_between(1, 3))
        .then(pl.col("b") + 1)
        .otherwise(pl.col("b"))
        .alias("b")
    )

If you are just interactively playing around with your data, and want to do it in Python and not in Excel or R, Pandas might still hit the spot. Or use Polars, and if need be then temporarily convert the data to Pandas or even to a Numpy array, manipulate, and then convert back.

P.S. Polars has an optimization to overwite a single value

    df_polars[4, 'b'] += 5
    df_polars
    ┌─────┬─────┐
    │ a   ┆ b   │
    │ --- ┆ --- │
    │ i64 ┆ i64 │
    ╞═════╪═════╡
    │ 1   ┆ 10  │
    │ 2   ┆ 21  │
    │ 3   ┆ 31  │
    │ 4   ┆ 41  │
    │ 5   ┆ 55  │
    └─────┴─────┘

But as far as I know, it doesn't allow slicing or anything.

The Polars code puts me off as being too verbose and requiring too many steps. I love the broadcasting ability that Pandas gets from Numpy. It's what sceintific computing should look like in my opinon. Maybe R, Julia or some array-based language does it a bit better than Numpy/Pandas, but it's certainly not like the Polars example.