logo
Tags down

shadow

How to duplicate rows based on difference in years in pandas


By : SukAwa
Date : September 17 2020, 05:00 PM
it should still fix some issue First idea is create lists of all years by custom function and then reshape by DataFrame.explode, working form 0.25+:
code :
def f(x):
    s, e = x.split('-')
    return list(range(int(s), int(e) + 1))

df['Year'] = df['Year'].apply(f)
df = df.explode('Year').reset_index(drop=True)
print (df)
      Item Model    Category  Year
0  2047125    HM  Mechanical  1984
1  2047125    HM  Mechanical  1985
2  2047125    HM  Mechanical  1986
3  2047125    HM  Mechanical  1987
4  2047125    HM  Mechanical  1988
df1 = df['Year'].str.split('-', expand=True).astype(int)
df['Year'] = df1[0].astype(int)
df = df.loc[df.index.repeat(df1[1] - df1[0] + 1)]
df['Year'] = df.groupby(level=0).cumcount() + df['Year']
df = df.reset_index(drop=True)
print (df)
      Item Model    Category  Year
0  2047125    HM  Mechanical  1984
1  2047125    HM  Mechanical  1985
2  2047125    HM  Mechanical  1986
3  2047125    HM  Mechanical  1987
4  2047125    HM  Mechanical  1988


Share : facebook icon twitter icon

pandas: select rows - based on list - DF with duplicate rows labels


By : Viji S
Date : March 29 2020, 07:55 AM
this one helps. similar to this but not the same: Selecting rows - based on a list - from a DF with duplicated columns
code :
In [9]:

(df2['type'] == 'c%') | (df2['type'] == 'pp%')
Out[9]:
base    False
c       False
d        True
base     True
e        True
Name: type, dtype: bool

In [8]:
df1[(df2['type'] == 'c%') | (df2['type'] == 'pp%')]
Out[8]:
     total
d      75
base   36
e      45

Pandas- Removing duplicate rows based on the columns


By : Anurag007
Date : March 29 2020, 07:55 AM
I wish did fix the issue. I want to delete duplicate rows with respect to a column and rearranging the data in the dataframe based on the certain conditions. For instance, I have the following data-frame:
code :
>>> df.groupby(['FROM', 'CONT']).sum()
              ID1    ID2    ID3    ID4  ID5    ID6  ID7
FROM  CONT                                             
63309 89    101.3  102.3    NaN  104.0  109  107.1  111
      90      NaN    NaN  103.0  105.0  NaN    NaN  NaN
63310 92    109.0  105.1  105.3  789.1  104    NaN  NaN
63311 94    104.0  109.0  890.0    NaN  NaN    NaN  107
>>> df.groupby(['FROM', 'CONT'], as_index=False).sum()
    FROM  CONT    ID1    ID2    ID3    ID4  ID5    ID6  ID7
0  63309    89  101.3  102.3    NaN  104.0  109  107.1  111
1  63309    90    NaN    NaN  103.0  105.0  NaN    NaN  NaN
2  63310    92  109.0  105.1  105.3  789.1  104    NaN  NaN
3  63311    94  104.0  109.0  890.0    NaN  NaN    NaN  107

Pandas-Add missing years in time series data with duplicate years


By : Santosh Acharya
Date : March 29 2020, 07:55 AM
wish of those help I have a dataset like this where data for some years are missing . , Make a MultiIndex so you don't have duplicates:
code :
df.set_index(['County', 'Year'], inplace=True)
index = pd.MultiIndex.from_product(df.index.levels)
df.reindex(index)

Pandas - duplicate rows based on values


By : c pappas
Date : March 29 2020, 07:55 AM
To fix the issue you can do You can use:
first reshape by set_index and unstack for Series with Multiindex repeat index for repeating all rows by loc reset_index for convert MultiIndex to columns remove column 0 by drop rename column and change ordering by reindex
code :
s = df.set_index(['Company','WEEK_DAYS','value']).stack()
df = (s.loc[s.index.repeat(s)]
       .reset_index()
       .drop(0, axis=1)
       .rename(columns={'level_3':'WEEK'})
       .reindex(columns=['Company','WEEK_DAYS','WEEK','value'])
      )
print (df)
  Company WEEK_DAYS          WEEK  value
0  google   MON-FRI  SEPTEMBER 29    0.5
1  google   MON-FRI  SEPTEMBER 29    0.5
2  google   MON-FRI  SEPTEMBER 29    0.5
3  google   MON-FRI  SEPTEMBER 29    0.5
4  google   MON-FRI  SEPTEMBER 29    0.5
5  google       TUE  SEPTEMBER 15    0.7
6  google       TUE  SEPTEMBER 15    0.7
7  google       TUE  SEPTEMBER 15    0.7
8  google       TUE  SEPTEMBER 22    0.7
9  google       TUE  SEPTEMBER 22    0.7

Pandas duplicate rows based on column value


By : joseph talamera
Date : March 29 2020, 07:55 AM
help you fix your problem Given the following dataframe , Let's use pandas in four steps:
code :
df_1 = df.set_index(['Cust_ID', 'OrderMade', 'OrderType'])

df_2 = df_1.where((df_1 == "Yes") | (df_1 == "")).rename_axis('OrderCategory', axis=1).stack().reset_index()

df_2['OrderCategory'] = df_2['OrderCategory'].mask(df_2['OrderMade'] == 'No','')

df_2.drop_duplicates().drop(0, axis=1)
    Cust_ID OrderMade OrderType   OrderCategory
0         1       Yes         A  OrderCategoryB
1         2       Yes         A  OrderCategoryC
2         3       Yes         B  OrderCategoryC
3         4        No                          
8         5        No                          
13        6       Yes         C  OrderCategoryC
14        6       Yes         C  OrderCategoryD
15        7       Yes         A  OrderCategoryB
16        8       Yes         A  OrderCategoryC
17        9        No                          
22       10       Yes         B  OrderCategoryA
23       10       Yes         B  OrderCategoryB
Related Posts Related Posts :
  • Why is the interpreter call the variable i a local variable
  • Passing multiple list in a function as *args gives a None result
  • Getting a tclerror with PhotoIMage
  • How to efficiently disaggregate data from?
  • Group by the dates to weeks
  • Accuracy problems in estimating pi using Machin's method
  • Printing a list method return None
  • how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3
  • Can I install python 3.7 in ubuntu 18.04 without having python 3.6 in the system?
  • Applying a function to every cell of dataframes
  • Cant install allennlp with pip on mac
  • ModuleNotFoundError: No module named 'virtualenv' Exiting due to failure, even after virtual environment is successfully
  • How to fix " 'int' object is not subscriptable" on this code
  • question about custom sorting using key argument in sorted()
  • Python3-tk is already installed but python3.7 can't find module tkinter
  • Pickle messing up text
  • How to install torch==0.3.1 in python=3.6
  • Tkinter Checkbuttons' values won't change
  • How to call asynchronous functions without expecting returns from them?
  • Unable to convert string to date (Portuguese locale)
  • Use textract on PDF file located on Google Cloud Storage
  • How to fix 'Can't open libmsodbcsql-17.3.so.1.1'
  • Using the join method in python - confusing error
  • Pandas Dataframe to .csv file
  • Tell if an object's attribute has been used
  • Read files from Cloud Storage having definite prefix but random postfix
  • Extract Button link text from a website python selenium
  • Reverse string, but not integers
  • TkFiledialog.askopenfilename() launches a window for "save as" in windows 10
  • Printing last 3 lines of a .csv file
  • I can't install python packages by using pip3 on alpine
  • Setting up a Flask app that uses headed Selenium on a Ubuntu 18.04 LTS Server
  • How to convert month name to month number in a timeseries in DataFrame?
  • Why does pandas.where() returning 'None'
  • How to extract matching keywords from two columns in a pandas dataframe?
  • python converting a List of Tuples into a Dict with external keys
  • How to fix ModuleNotFoundError: No module named 'pip._internal' with python source code installation
  • Pytorch RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead
  • Covert a dataframe into a matrix form
  • i am webscraping with bs4 and the urls wont show up
  • docker build: Returned a non-zero code: 5
  • Pandas read_csv from FileStorage in Flask
  • How do I add elements of a set and print their sum?
  • Is there a way to add a column to a geopandas dataframe using a single value geoseries?
  • Issue with appending to an array
  • no module named "tensorflow.python.platform" when importing tensorflow || tflearn on python shell
  • Accesing "Next" page with scrapy rules
  • How to take all combination of a pandas dataframe (choosing 2 at a time) and make a new dataframe with each two combinat
  • Connecting the missing pixels
  • Returns Nothing [] from Google + API using Python
  • pd.DataFrame: adding values in specific locations
  • Validating phone numbers in python using RE
  • How to fetch all data of solr which contains 40k rows into csv?
  • Inheritance of modules in Python?
  • ModuleNotFoundError: No module named 'frontend'
  • Only One Pod is consuming all the computing resource although specified the limits and requests resources in pod templat
  • IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
  • Scraping 'next' page after finishing in the main one using Rules
  • Add custom headers to SOAP request using zeep.Client Python
  • It the any proper way how to take a two byte elemetns from list, concat them, and convert them to integer
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk