logo
down
shadow

Pandas merge by name and date (multiple columns)


Pandas merge by name and date (multiple columns)

By : Bik
Date : October 17 2020, 06:10 PM
Any of those help I am looking to merge two data frames, first by name and then by date. , To merge two dataframes by multiple columns you can use
code :
data = pd.merge(df1, df2, on=['symbol','date'], how='left')
data = pd.merge(df1, df2, left_on=['symbol','date'], right_on=['symbol_2','date_2'], how='left')


Share : facebook icon twitter icon
Merge columns in Pandas based on date criteria

Merge columns in Pandas based on date criteria


By : Prasad
Date : March 29 2020, 07:55 AM
should help you out I have a dataframe like this , Assuming the dates are parsed you can do this:
code :
df.apply(lambda row: row[str(row.name.year)], axis=1)
pd.Series(
    df.lookup(
        row_labels=df.index,
        col_labels=df.index.year.astype(str)
    ),
    index=df.index
)
pandas merge on date like columns does not work

pandas merge on date like columns does not work


By : Pouyaan Pf
Date : March 29 2020, 07:55 AM
Does that help first, the solution didn't work in my code pandas merge on date column issue , Sample:
code :
df1 = pd.DataFrame({'captureDate':['2017-06-22'] *3 +['2017-06-25'] * 3 +['2017-06-28'] * 2,
                   'rule_id':[40,10,20,30,70,10,60,10]})
print (df1)
  captureDate  rule_id
0  2017-06-22       40
1  2017-06-22       10
2  2017-06-22       20
3  2017-06-25       30
4  2017-06-25       70
5  2017-06-25       10
6  2017-06-28       60
7  2017-06-28       10
df2 = pd.DataFrame({'captureDate':['2017-06-22'] *3 +['2017-06-25'] * 3 +['2017-06-28'] * 2,
                   'rule_id':[1,2,3,4,5,6,7,8]})
print (df2)
  captureDate  rule_id
0  2017-06-22        1
1  2017-06-22        2
2  2017-06-22        3
3  2017-06-25        4
4  2017-06-25        5
5  2017-06-25        6
6  2017-06-28        7
7  2017-06-28        8
df1['captureDate'] = pd.to_datetime(df1['captureDate'])
df2['captureDate']  = pd.to_datetime(df2['captureDate'])
print (df1['captureDate'].equals(df2['captureDate']))
True

inner = pd.merge(df1, df2,  on='captureDate', how='inner')
print (inner)
   captureDate  rule_id_x  rule_id_y
0   2017-06-22         40          1
1   2017-06-22         40          2
2   2017-06-22         40          3
3   2017-06-22         10          1
4   2017-06-22         10          2
5   2017-06-22         10          3
6   2017-06-22         20          1
7   2017-06-22         20          2
8   2017-06-22         20          3
9   2017-06-25         30          4
10  2017-06-25         30          5
11  2017-06-25         30          6
12  2017-06-25         70          4
13  2017-06-25         70          5
14  2017-06-25         70          6
15  2017-06-25         10          4
16  2017-06-25         10          5
17  2017-06-25         10          6
18  2017-06-28         60          7
19  2017-06-28         60          8
20  2017-06-28         10          7
21  2017-06-28         10          8
df3 = pd.concat([df1.set_index('captureDate'), 
                 df2.set_index('captureDate')], 
                 axis=1, 
                 keys=('a', 'b'))
df3.columns = df3.columns.map('_'.join)
print (df3)
             a_rule_id  b_rule_id
captureDate                      
2017-06-22          40          1
2017-06-22          10          2
2017-06-22          20          3
2017-06-25          30          4
2017-06-25          70          5
2017-06-25          10          6
2017-06-28          60          7
2017-06-28          10          8
df1 = df1.drop_duplicates('captureDate')
df2 = df2.drop_duplicates('captureDate')
print (df1)
  captureDate  rule_id
0  2017-06-22       40
3  2017-06-25       30
6  2017-06-28       60

print (df2)
  captureDate  rule_id
0  2017-06-22        1
3  2017-06-25        4
6  2017-06-28        7

inner = pd.merge(df1, df2,  on='captureDate', how='inner')
print (inner)
  captureDate  rule_id_x  rule_id_y
0  2017-06-22         40          1
1  2017-06-25         30          4
2  2017-06-28         60          7
df1 = pd.DataFrame({'captureDate':['2017-06-22']* 3 + ['2017-06-25']* 3 + ['2017-06-28'] * 2,
                   'rule_id':[40,10,20,30,70,10,60,10]})

df2 = pd.DataFrame({'captureDate':['2017-06-22'] * 3 + ['2017-06-25'] * 3,
                   'rule_id':[1,2,3,4,5,6]})


df1['new'] = df1.groupby('captureDate').cumcount()
df2['new'] = df2.groupby('captureDate').cumcount()
print (df1)
  captureDate  rule_id  new
0  2017-06-22       40    0
1  2017-06-22       10    1
2  2017-06-22       20    2
3  2017-06-25       30    0
4  2017-06-25       70    1
5  2017-06-25       10    2
6  2017-06-28       60    0
7  2017-06-28       10    1

print (df2)
  captureDate  rule_id  new
0  2017-06-22        1    0
1  2017-06-22        2    1
2  2017-06-22        3    2
3  2017-06-25        4    0
4  2017-06-25        5    1
5  2017-06-25        6    2
df3 = pd.merge(df1, df2, on=['captureDate','new']).drop('new', axis=1)
print (df3)
  captureDate  rule_id_x  rule_id_y
0  2017-06-22         40          1
1  2017-06-22         10          2
2  2017-06-22         20          3
3  2017-06-25         30          4
4  2017-06-25         70          5
5  2017-06-25         10          6
Merge columns based on values in multiple columns pandas

Merge columns based on values in multiple columns pandas


By : Chou Jay
Date : March 29 2020, 07:55 AM
I hope this helps you . Use combine_first or fillna:
code :
df['new'] = df["Col2"].combine_first(df["Col3"])
#alternative
#df['new'] = df["Col2"].fillna(df["Col3"])
print (df)
  Name       Col2       Col3        new
0    A  16-1-2000        NaN  16-1-2000
1    B  13-2-2001        NaN  13-2-2001
2    C        NaN        NaN        NaN
3    D        NaN  23-4-2014  23-4-2014
4    X        NaN        NaN        NaN
5    Q        NaN   4-5-2009   4-5-2009
df['new'] = np.where(df["Col2"].notnull() & df["Col3"].isnull(), df["Col2"],
            np.where(df["Col2"].isnull() & df["Col3"].notnull(), df["Col3"], np.nan))
m1 = df["Col2"].notnull() & df["Col3"].isnull()
m2 = df["Col2"].isnull() & df["Col3"].notnull()
df['new'] = np.select([m1, m2], [df["Col2"], df["Col3"]], np.nan)
df['new'] = df.iloc[:, 1:].ffill(axis=1).iloc[:, -1]
Pandas merge on two columns using date and another column

Pandas merge on two columns using date and another column


By : Salmajane
Date : March 29 2020, 07:55 AM
help you fix your problem The first merge statement you perform should get you halfway there, but unfortunately, it's the second half of a two-step process. It sounds like you want to merge the sales data onto the visits data after summing the visits by Date/upc. You actually have to do a sum first (the merge command does not do this by itself). Try:
code :
df2_sum = df2.groupby(["Date", "upc"])["visits"].sum().reset_index()
df3 = pd.merge(df1, df2_sum, on=["Date", "upc"], how="left")
Pandas Dataframe Merge on 2 Columns Including Conditional If Merge: If Date in df_2 is Between Two Other Dates in df_1

Pandas Dataframe Merge on 2 Columns Including Conditional If Merge: If Date in df_2 is Between Two Other Dates in df_1


By : Barry D D
Date : March 29 2020, 07:55 AM
I wish this help you You could left merge on SVDiscrep_Merge, then filter the result using the following boolean mask:
code :
mask = (((result['RON_DATE'] <= result['REPORT_DT']) 
         & (result['REPORT_DT'] < result['Next SV1 Date'])) 
        | pd.isnull(result['REPORT_DT']))
import datetime as DT 
import pandas as pd

df_1 = pd.DataFrame( 
 {"SVDiscrep_Merge": ["2081916SAN", "2081242DFW", "2081248ORD","20874CLE", "2081740DEN"],
 "RON_DATE": [DT.datetime(2017,6,1), DT.datetime(2017,6,4), DT.datetime(2017,6,6), DT.datetime(2017,6,7), DT.datetime(2017,6,8)],
 "Next SV1 Date": [DT.datetime(2017,6,4), DT.datetime(2017,6,6), DT.datetime(2017,6,7), DT.datetime(2017,6,8), DT.datetime(2017, 6, 18)]})

df_2 = pd.DataFrame( 
 {"SVDiscrep_Merge": ["2081916SAN", "2081916SAN", "2081916SAN","2081740DEN"],
 "REPORT_DT": [DT.datetime(2017,6,1), DT.datetime(2017,6,3), DT.datetime(2017,6,4), DT.datetime(2017,6,9)],
 "ColA": ["A", "B", "C", "D"]})

result = pd.merge(df_1, df_2, on='SVDiscrep_Merge',  how='left')
mask = (((result['RON_DATE'] <= result['REPORT_DT']) 
         & (result['REPORT_DT'] < result['Next SV1 Date'])) 
        | pd.isnull(result['REPORT_DT']))
result = result.loc[mask].drop('REPORT_DT', axis=1)
print(result)
  Next SV1 Date   RON_DATE SVDiscrep_Merge ColA
0    2017-06-04 2017-06-01      2081916SAN    A
1    2017-06-04 2017-06-01      2081916SAN    B
3    2017-06-06 2017-06-04      2081242DFW  NaN
4    2017-06-07 2017-06-06      2081248ORD  NaN
5    2017-06-08 2017-06-07        20874CLE  NaN
6    2017-06-18 2017-06-08      2081740DEN    D
Related Posts Related Posts :
  • Multiple For loops, print else only once if condition is not met
  • Select one item from Series and keep the index
  • __repr__ method appears can't be invoked automatically for Exception class
  • Problem with list value (ValueError) in python 3
  • How to get TouchSensor nested under joint in Webots (Python API)
  • How to specify kernel while executing a Jupyter notebook using Papermill's Python client?
  • How to hide password in Database Connection?
  • How to get a list of dictionaries from the following code?
  • 'How to find out noun to which pronoun is referring to' in python
  • Removing a character (^) from each row of panda Dataframe and get unique words in each row
  • Changing a static variable of inherited classes
  • Django Query result comparison with if statement
  • Python: how to merge two dataframe based only on different columns?
  • Filter data by last 3 months and by ID
  • Inplace arithmetic operation versus normal arithmetic operation in PyTorch Tensors
  • How can I add custom signs to spaCy's punctuation functionality?
  • Ensure positive difference of two numbers
  • i keep getting an error that my list index is out of range
  • Is there a way to create gantt charts in python?
  • How to view network weights and bias during training
  • How can I force SAS to wait for a command to fully execute?
  • Remove all occurences of a value from a nested dictionary
  • How to ensure secure randomization for python password generator
  • Amazon SageMaker deploying from model artifacts - what object do we load from archive?
  • [] parameter or input used in sum() function - what is it doing?
  • Outlook email text formatting from Python application
  • Python 3 - comparing enums against hex value
  • Elegant way to check if a float is between two numbers in Python?
  • Understanding return [0,size-1][nums[0]<nums[size-1]] in Python
  • How do I make this script that heats up my CPU safe
  • RegEx for matching capital letters and numbers
  • What is differnces between array[0] and array[0:1] in Python?
  • How to run both items in scrapy function?
  • How to count the number of sequences of n numbers where no two adjacent numbers are the same?
  • Is there a more efficient way to re-write multi if-else statement
  • ValueError: Error when checking target: expected dense_3 to have shape (1000,) but got array with shape (1,)
  • SytanxError: Invalid Sytax
  • Setting debug = False makes the Django app crash with the following error, how to fix it?
  • How to get the average of many lists embedded within each other?
  • Paramiko with subprocess
  • 2D table conversion for example: y = f(x1,x2) => x1 = f(y, x2)
  • Return a literal string of a tuple in python
  • How to split a Column when you have same values?
  • How to perform str.strip in dataframe and save it with inplace=true?
  • why zip(*k) can't work when k is a iterator?
  • How to get list as an input from command line python?
  • Is Tensorflow Dataset.from_generator deprecated in tensorflow 2.0 ? It throws tf.py_func deprecation error
  • Loop as long as input is greater then previous input
  • How to combine 2 rows based on different column values
  • Extracting 3 levels deep product details. Getting error NameError: name 'item' is not defined
  • How do I get the default fill values?
  • How to convert single list's elements in form of dictionary
  • Search a user given number inside a list using for loop
  • How to extract a particular value from this data structure?
  • How to save a df into two excel files in multiple locations?
  • How to get the sum of a field in Django
  • i+ =1 generating a Syntax error in for loop
  • Lookup if Dictionary key contains items in Python
  • How to comma separate an array of integers in python?
  • Extract rows from pandas dataframe corresponding to list of month-day
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk