logo
Tags down

shadow

Python Pandas - find all unique combinations of rows of a DataFrame without repeating values in the columns


By : JoeB
Date : October 18 2020, 06:10 PM
will be helpful for those in need I think you may need using permutations from itertools, then we just need look up the df after pivot
code :
l=list(itertools.permutations([0,1,2]))
s=df.pivot(*df.columns)
list_of_df=[pd.DataFrame({'A':s.index,
                          'B':s.columns.values[list(x)],
                          'distance':s.values[np.arange(len(s)),x]}) for x in l ]
list_of_df[0]
Out[725]: 
    A   B  distance
0   1  17     304.0
1  10  20     146.0
2  13  25     191.0
list_of_df[1]
Out[726]: 
    A   B  distance
0   1  17     304.0
1  10  25     246.0
2  13  20      91.0
s=df.pivot(*df.columns)
l=list(itertools.permutations(list(range(s.shape[1]))))
l1=list(itertools.permutations(list(range(len(s))),3))

list_of_df=[pd.DataFrame({'A':s.index[list(y)],
                          'C':s.columns.values[list(x)],
                          'distance':s.iloc[list(y),:].values[np.arange(len(y)),x]}) for x in l for y in l1 ]


Share : facebook icon twitter icon

Repeating adding columns as rows in DataFrame Python 2.7 Pandas 0.17.1


By : rchristy
Date : March 29 2020, 07:55 AM
Does that help I have 2 dataframes I would like to combine, but they don't share an index , Let's do a cartesian merge and concat with original dataframe:
code :
pd.concat([df.assign(metric_type='result'),
           df.assign(key=1).merge(df2.reset_index().assign(key=1), on='key', suffixes=('_x',''))[['date','mail_volume','index']].rename(columns={'index':'metric_type'})])\
  .sort_values(by='date')
       date  mail_volume metric_type
0  2011-01-01          100      result
0  2011-01-01          110        0.25
1  2011-01-01          120         0.5
2  2011-01-01          130        0.75
1  2011-02-01          150      result
3  2011-02-01          110        0.25
4  2011-02-01          120         0.5
5  2011-02-01          130        0.75
2  2011-03-01          125      result
6  2011-03-01          110        0.25
7  2011-03-01          120         0.5
8  2011-03-01          130        0.75

How to get the number of unique combinations of two columns that occur in a python pandas dataframe


By : Sean S.
Date : March 29 2020, 07:55 AM
this will help Use drop_duplicates:
code :
print (df.drop_duplicates(['a','b']))
     a    b
1  203  487
2  876  111
4  876  487

a = len(df.drop_duplicates(['a','b']).index)
a = (~df.duplicated(['a','b'])).sum()
a = len(df.index) - df.duplicated(['a','b']).sum()
a = (df.a.astype(str) + '_' + df.b.astype(str)).nunique()
print (a)
3

How can i find unique combinations of 2 columns, delete not unique combinations, keeping only first rows in pandas


By : cha
Date : March 29 2020, 07:55 AM
Does that help I have a dataset that contains 2 columns. And there are data combinations. I want to find if there are not unique combinations and delete them keeping only the first row. , I believe you need sorting each row and remove duplicates:
code :
df = (pd.DataFrame(np.sort(df[['dim', 'linked_dim']], axis=1),
                   columns=['dim', 'linked_dim'])
        .drop_duplicates())
print (df)
                     dim        linked_dim
0   Customer group$Large  DEPARTMENT$Sales
1  Customer group$Medium  DEPARTMENT$Sales
2   Customer group$Small  DEPARTMENT$Sales

Find all indices/instances of all repeating patterns across columns and rows of pandas dataframe


By : user3248466
Date : March 29 2020, 07:55 AM
it helps some times Use boolean indexing with Series.isin instead second and third conditions:
code :
df1 = df[(df.name == 'bob') & df.car.isin(['b','c'])]
print (df1)
   name car
0   bob   b
1   bob   c
8   bob   b
9   bob   c
10  bob   b
11  bob   c
out_idx = df.index[(df.name == 'bob') & df.car.isin(['b','c'])]
out_idx = df[(df.name == 'bob') & df.car.isin(['b','c'])].index
df1 = df[(df.name == 'bob') & ((df.car == 'b') | (df.car == 'c'))]

Python - How to expand a Pandas dataframe's rows to include all combinations of values of the key columns?


By : Vijay K
Date : March 29 2020, 07:55 AM
help you fix your problem Use @senderle's fast cartesian_product for performance:
code :
v = cartesian_product(df.g1, df.g2)
idx = pd.MultiIndex.from_arrays([v[:, 0], v[:, 1]])

df.set_index(['g1', 'g2']).reindex(idx)
     val
a x  1.0
  y  NaN
  z  NaN
b x  NaN
  y  2.0
  z  NaN
c x  NaN
  y  NaN
  z  3.0
%timeit df.set_index(['g1','g2']).T.stack().unstack().T
%%timeit
v = cartesian_product(df.g1, df.g2)
idx = pd.MultiIndex.from_arrays([v[:, 0], v[:, 1]])
df.set_index(['g1', 'g2']).reindex(idx)

14.6 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.56 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Related Posts Related Posts :
  • RegEx for matching specific element of HTML
  • How to initiate widgets through tk/tcl
  • urlparse does not raise exception for an invalid url
  • plot stacked percentage barchart matplotlib
  • How to have the .isupper() and .islower() methods in one line of code?
  • Removing header index from dataframe
  • how to input all data first, then give all output in python?
  • Hot to fix Tensorflow model not running in Eager mode with .fit()?
  • Proxy configuration in Scrapy
  • If/else statement within loop over dataframe
  • I have a code or stop the loop, I do not know how I can do for what stops
  • python pandas : lambda or other method to count NaN values / len(value)<1 along rows
  • Combine two dataframes with same values in several columns
  • Replace Iterations by elegant Pandas code
  • If all elements match requirement not using "if all"
  • Access to 3D array in fragment shader
  • How to normalize the columns of a DataFrame using sklearn.preprocessing.normalize?
  • Validation loss not moving with MLP in Regression
  • ML with imbalanced binary dataset
  • Is there a way to iterate through s3 object content using a SQL expression?
  • Appending lists to a result list keeps replacing the last one appended
  • How can I reuse a function to order different attributes of an object
  • Finding an integer in a list of integers if condition fulfilled
  • Python: Replacing character in for loop
  • Why can I run this command from the terminal but I get an error when my python runs it from the terminal for me?
  • How to create a conda environment from global python environment?
  • PyGame MOUSEBUTTONDOWN event not registering?
  • Pandas - Replace values in column with other values from the same column
  • Why this statement is evaluated to False even it is true?
  • Return an element based on another element in a nested list in python
  • Error ::: ValueError: could not convert string to float: '28,37'
  • How to use for and if together in Python
  • How to call a python method from robot framework
  • Python ThreadPoolExecutor Suppress Exceptions
  • how can solve this problem with dynamic programming?
  • How to convert "tensor" to "numpy" array in tensorflow?
  • Tf 2.0 : RuntimeError: GradientTape.gradient can only be called once on non-persistent tapes
  • Scale and concatenate pandas dataframe into a dask dataframe
  • How to create a URL for templateView?
  • Python : Not getting simple adding result
  • Python hex string encoding
  • Get week start date from week number
  • How to use imports from requirements.txt in python
  • Removing tab indent in ipython shell
  • I need to remove duplicates from a list but add the numeric value in them
  • Delay default arguments being read until function is called
  • Interpolate / fillna with a decay formula in pandas
  • What python package can translate Greek letter to ASCII requivalent?
  • How to get output of OS command from Jupyter notebook?
  • Printing AND writing the RIGHTLY formatted number
  • How do I create a shortcut to import most used python modules?
  • Matplotlib: Show selected date labels on x axis
  • Understanding memoization in Python
  • why does the len function return 2 on some iterations when they are all the same length?
  • Change in preference value does not affect the results of Affinity propagation Clustering
  • returning values inside a function
  • Why cant I use a variable in str slicing?
  • Section divider in Spyder
  • Conditional statement in selenium if element does not exists
  • Pandas : how to select index/row label in dataframe that matches a condition
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk