logo
Tags down

shadow

How to repeat pandas dataframe records based on column value


By : gang wang
Date : October 17 2020, 06:10 PM
I hope this helps you . I'm trying to duplicate rows of a pandas DataFrame (v.0.23.4, python v.3.7.1) based on an int value in one of the columns. I'm applying code from this question to do that, but I'm running into the following data type casting error: TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'. Basically, I'm not understanding why this code is attempting to cast to int32. , Just a workaround:
code :
pd.concat([dummy_df[dummy_df.c2.eq(0)],dummy_df.loc[dummy_df.index.repeat(dummy_df.c2)]])
dummy_df.reindex(dummy_df.index.repeat(dummy_df['c2'].clip(lower=1)))
  c1  c2
0  a   0
1  b   1
2  c   2
2  c   2


Share : facebook icon twitter icon

Pandas DataFrame Repeat Value Based on a Condition


By : Andrew
Date : March 29 2020, 07:55 AM
it should still fix some issue I'm trying to repeat row values in a DataFrame based on conditions in a column. If the value in column Change = 1, then I'd like to repeat the values in columns A, B, and C until the next Change = 1. , You could fill in the Change == 0 rows with NaN and ffill:
code :
In [11]: df.loc[df.Change != 1, ['A', 'B', 'C']] = numpy.nan

In [12]: df
Out[12]:
             A   B   C  Change
2000-01-31   0   1   1       1
2000-02-01 NaN NaN NaN       0
2000-02-02 NaN NaN NaN       0
2000-02-03   1   0   1       1
2000-02-04 NaN NaN NaN       0

In [13]: df.ffill()
Out[13]:
            A  B  C  Change
2000-01-31  0  1  1       1
2000-02-01  0  1  1       0
2000-02-02  0  1  1       0
2000-02-03  1  0  1       1
2000-02-04  1  0  1       0
In [14]: df[df.Change == 1].resample('D', fill_method='ffill')
Out[14]:
            A  B  C  Change
2000-01-31  0  1  1       1
2000-02-01  0  1  1       1
2000-02-02  0  1  1       1
2000-02-03  1  0  1       1

Return subset/slice of Pandas dataframe based on matching column of other dataframe, for each element in column?


By : user3608633
Date : March 29 2020, 07:55 AM
this will help Ok, from what I understand, the problem at its most simple is that you have a pd.Series of values (i.e. a["key"], which let's just call keys), which correspond to the rows of a pd.DataFrame (the df called b), such that set(b["key"]).issuperset(set(keys)). You then want to apply some function to each group of rows in b where the b["key"] is one of the values in keys.
I'm purposefully disregarding the other df -- a -- that you mention in your prompt, because it doesn't seem to bear any significance to the problem, other than being the source of keys.
code :
def descriptive_func(df):
    """
    Takes a df where key is always equal and returns some summary.
    :type df: pd.DataFrame
    :rtype: pd.Series|pd.DataFrame
    """
    pass

# filter down to those rows we're interested in
valid_rows = b[b["key"].isin(set(keys))]  

# this groups by the value and applies the descriptive func to each sub df in turn
summary = valid_rows.groupby("key").apply(descriptive_func)  

Calculate new column in pandas dataframe based only on grouped records


By : Ramonita Rosado
Date : March 29 2020, 07:55 AM
like below fixes the issue I have a dataframe with various events(id) and following structure, the df is grouped by id and sorted on timestamp : , I think this does what you want:
code :
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['Duration']  = df.groupby('id')['timestamp'].diff().shift(-1)
import pandas as pd
import numpy as np

# Generate some fake data
df = pd.DataFrame()
df['id'] = [1]*5 + [2]*3 + [3]*4
df['timestamp'] = pd.to_datetime('2017-01-1')
duration = sorted(np.random.randint(30,size=len(df)))
df['timestamp'] += pd.to_timedelta(duration)
df['A'] = 'spam'
df['B'] = 'eggs'

Excluding records from pandas DataFrame based on column values


By : PythonDen
Date : March 29 2020, 07:55 AM
I wish did fix the issue. It looks like you want to find all ids that are not returned by potatoes[logic1 | logic2]. You can use an inverted isin call to do so.
code :
idx_flagged = potatoes.loc[logic1 | logic2, 'id'].values   
potatoes[~potatoes.id.isin(idx_flagged)]

   id  Desc  Active  Enabled  Value
0   1  Bla1       1        0      1
5   3  Bla6       1        1      0
6   4  Bla7       0        0      1

Repeat rows in a pandas DataFrame based on column value


By : Aditya Dey
Date : March 29 2020, 07:55 AM
Hope this helps I have the following df: , reindex+ repeat
code :
df.reindex(df.index.repeat(df.persons))
Out[951]: 
   code  .     role ..1  persons
0   123  .  Janitor   .        3
0   123  .  Janitor   .        3
0   123  .  Janitor   .        3
1   123  .  Analyst   .        2
1   123  .  Analyst   .        2
2   321  .   Vallet   .        2
2   321  .   Vallet   .        2
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
3   321  .  Auditor   .        5
Related Posts Related Posts :
  • RegEx for matching specific element of HTML
  • How to initiate widgets through tk/tcl
  • urlparse does not raise exception for an invalid url
  • plot stacked percentage barchart matplotlib
  • How to have the .isupper() and .islower() methods in one line of code?
  • Removing header index from dataframe
  • how to input all data first, then give all output in python?
  • Hot to fix Tensorflow model not running in Eager mode with .fit()?
  • Proxy configuration in Scrapy
  • If/else statement within loop over dataframe
  • I have a code or stop the loop, I do not know how I can do for what stops
  • python pandas : lambda or other method to count NaN values / len(value)<1 along rows
  • Combine two dataframes with same values in several columns
  • Replace Iterations by elegant Pandas code
  • If all elements match requirement not using "if all"
  • Access to 3D array in fragment shader
  • How to normalize the columns of a DataFrame using sklearn.preprocessing.normalize?
  • Validation loss not moving with MLP in Regression
  • ML with imbalanced binary dataset
  • Is there a way to iterate through s3 object content using a SQL expression?
  • Appending lists to a result list keeps replacing the last one appended
  • How can I reuse a function to order different attributes of an object
  • Finding an integer in a list of integers if condition fulfilled
  • Python: Replacing character in for loop
  • Why can I run this command from the terminal but I get an error when my python runs it from the terminal for me?
  • How to create a conda environment from global python environment?
  • PyGame MOUSEBUTTONDOWN event not registering?
  • Pandas - Replace values in column with other values from the same column
  • Why this statement is evaluated to False even it is true?
  • Return an element based on another element in a nested list in python
  • Error ::: ValueError: could not convert string to float: '28,37'
  • How to use for and if together in Python
  • How to call a python method from robot framework
  • Python ThreadPoolExecutor Suppress Exceptions
  • how can solve this problem with dynamic programming?
  • How to convert "tensor" to "numpy" array in tensorflow?
  • Tf 2.0 : RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes
  • Scale and concatenate pandas dataframe into a dask dataframe
  • How to create a URL for templateView?
  • Python : Not getting simple adding result
  • Python hex string encoding
  • Get week start date from week number
  • How to use imports from requirements.txt in python
  • Removing tab indent in ipython shell
  • I need to remove duplicates from a list but add the numeric value in them
  • Delay default arguments being read until function is called
  • Interpolate / fillna with a decay formula in pandas
  • What python package can translate Greek letter to ASCII requivalent?
  • How to get output of OS command from Jupyter notebook?
  • Printing AND writing the RIGHTLY formatted number
  • How do I create a shortcut to import most used python modules?
  • Matplotlib: Show selected date labels on x axis
  • Understanding memoization in Python
  • why does the len function return 2 on some iterations when they are all the same length?
  • Change in preference value does not affect the results of Affinity propagation Clustering
  • returning values inside a function
  • Why cant I use a variable in str slicing?
  • Section divider in Spyder
  • Conditional statement in selenium if element does not exists
  • Pandas : how to select index/row label in dataframe that matches a condition
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk