Find columns that contains substring and replace it - Pandas

By : bugeelbond
Date : October 16 2020, 06:10 PM
this one helps. Here's simple and straight forward way of removing all non-special characters from the columns by using a list comprehension and str.replace:
code :
(pd.concat([df[col].astype(str).str.replace(r'\W+', '', regex=True) 
           for col in df.columns], 1))

  license   value1 value2 value3
2       a  5120000     15     45
1       b  3246440     10     65
4       b  1890220     50     10
5       c  2005240     32     12

Find a substring, replace a substring according the case

By : Aak
Date : March 29 2020, 07:55 AM
wish of those help What's the easiest and fastest way to find a sub-string(template) in a string and replace it with something else following the template's letter case (if all lower case - replace with lowercase, if all upper case - replace with uppercase, if begins with uppercase and so on...) , Ended up doing that:
code :
public static string ReplaceWithTemplate(this string original, string pattern, string replacement)
  var template = Regex.Match(original, pattern, RegexOptions.IgnoreCase).Value.Remove(0, 1);
  template = template.Remove(template.Length - 1);
  var chars = new List<char>();
  var isLetter = false;
  for (int i = 0; i < replacement.Length; i++)
     if (i < (template.Length)) isLetter = Char.IsUpper(template[i]);
                       isLetter ? Char.ToUpper(replacement[i]) 
                                : Char.ToLower(replacement[i])));

  return new string(chars.ToArray());

Python find all matching substring patterns and replace substring

By : Jaigue
Date : March 29 2020, 07:55 AM
I hope this helps you . I want to search if a sentence has particular pattern or not. Do nothing if not found. If pattern found, substitute pattern with another substring in the string. , Using regular expression:
code :
import re

def findSubString(raw_string, start_marker, end_marker):
    return re.sub(
        r'(?<={}).*?(?={})'.format(re.escape(start_marker), re.escape(end_marker)),
        lambda m: m.group().strip().replace(' ', '_'),

line1 = "Who acted as `` Bruce Wayne '' in the movie `` Batman Forever '' ?"
line1 = findSubString(line1, "``", "''")
assert line1 == "Who acted as ``Bruce_Wayne'' in the movie ``Batman_Forever'' ?"
def findSubString(raw_string, start_marker, end_marker): 
    result = []
    rest = raw_string
    while True:
        head, sep, tail = rest.partition(start_marker)
        if not sep:
        body, sep, tail = tail.partition(end_marker)
        if not sep:
        result.append(head + start_marker + body.strip().replace(' ', '_') + end_marker)
        rest = tail
    return ''.join(result)

Regex: Find substring starts with AND end with from a string and replace a word from substring

By : user3656118
Date : March 29 2020, 07:55 AM
hop of those help? Possible inputs:
code :
var re = /^(\S+)\s*\S*(?=,)/gmi; 
var str = 'Hi John, I have recently..\nhi , I have...\nHi Hans, I have...\nHi, I have...';
var subst = '$1 David'; 

var result = str.replace(re, subst);

Pandas: using str.contains and map to find some substring and replace value in column

By : Micaela Sanchez
Date : March 29 2020, 07:55 AM
it fixes the issue I have dataframe , One possible solution:
code :
#create dict for map
d = df2.set_index('url')['category'].to_dict()
print (d)
{'community.livejournal.com/psp_ru': 'Рубрики/Развлечения/Игры/Приставочныеигры/',
 'kmzpub.ru/games.asp': 'Рубрики/Hi-Tech/Программы/Софт/Игры/Универсальное/Рубрики/Hi-Tech/Программы/Софт/Игры/Универсальное/', 
 'falloutsite.ru/': 'Рубрики/Hi-Tech/Программы/Софт/Игры/', 
 'sigma-team.ru/content/view/15/19': 'Рубрики/Hi-Tech/Программы/Софт/Игры/QuakeиCounter-Strike/'}

#use list comprehension for map by substring   
print (df1.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x])) )
0                                         NaN
1                                         NaN
2                                         NaN
3                                         NaN
4  Рубрики/Развлечения/Игры/Приставочныеигры/
5                                         NaN
6                                         NaN
7                                         NaN
8                                         NaN
df1['category']=df1.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x])) 
print (df1)
   member_id device_type                                          device_id  \
0     603609         url                                           mail.ru/   
1     603609         url                                           mail.ru/   
2     603609         url                                           mail.ru/   
3     603609         url                                           mail.ru/   
4     603609         url           mail.ru/community.livejournal.com/psp_ru   
5     603609         url  lady.mail.ru/article/491411-kurban-omarov-otve...   
6     603609         url                                           mail.ru/   
7     603609         url  lady.mail.ru/article/491411-kurban-omarov-otve...   
8     603609         url  lady.mail.ru/article/491411-kurban-omarov-otve...   

   event_type event_path                    event_duration  \
0           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
1           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
2           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
3           3         pc  7d4a095373874b4fb26a2e6d070b6ad3   
4          28         pc  7d4a095373874b4fb26a2e6d070b6ad3   
5           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
6           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
7           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
8           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   

0                                         NaN  
1                                         NaN  
2                                         NaN  
3                                         NaN  
4  Рубрики/Развлечения/Игры/Приставочныеигры/  
5                                         NaN  
6                                         NaN  
7                                         NaN  
8                                         NaN  
df1 = pd.DataFrame({'device_id':['a d','b s','c r'], 'b':[1,2,3]})    
df2 = pd.DataFrame({'url':['a','m','k'], 'category':['one','two','three']})    
#df2 = pd.DataFrame({'url':['a r','m','k'], 'category':['one','two','three']})    

d = df2.set_index('url')['category'].to_dict()
print (d)
{'k': 'three', 'a': 'one', 'm': 'two'}

df1['category']=df1.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x])) 
print (df1)
   b device_id category
0  1       a d      one
1  2       b s      NaN
2  3       c r      NaN

Pandas dataframe replace string in multiple columns by finding substring

By : orangec
Date : March 29 2020, 07:55 AM
around this issue You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):
code :
str_cols = ['Answer']    # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
#Question   Answer
#0      1       A
#1      2       X
#2      3       C
str_cols = df.select_dtypes(['object']).columns
