logo
Tags down

shadow

Find columns that contains substring and replace it - Pandas


By : bugeelbond
Date : October 16 2020, 06:10 PM
this one helps. Here's simple and straight forward way of removing all non-special characters from the columns by using a list comprehension and str.replace:
code :
(pd.concat([df[col].astype(str).str.replace(r'\W+', '', regex=True) 
           for col in df.columns], 1))

  license   value1 value2 value3
2       a  5120000     15     45
1       b  3246440     10     65
4       b  1890220     50     10
5       c  2005240     32     12


Share : facebook icon twitter icon

Find a substring, replace a substring according the case


By : Aak
Date : March 29 2020, 07:55 AM
wish of those help What's the easiest and fastest way to find a sub-string(template) in a string and replace it with something else following the template's letter case (if all lower case - replace with lowercase, if all upper case - replace with uppercase, if begins with uppercase and so on...) , Ended up doing that:
code :
public static string ReplaceWithTemplate(this string original, string pattern, string replacement)
{
  var template = Regex.Match(original, pattern, RegexOptions.IgnoreCase).Value.Remove(0, 1);
  template = template.Remove(template.Length - 1);
  var chars = new List<char>();
  var isLetter = false;
  for (int i = 0; i < replacement.Length; i++)
  {
     if (i < (template.Length)) isLetter = Char.IsUpper(template[i]);
     chars.Add(Convert.ToChar(
                       isLetter ? Char.ToUpper(replacement[i]) 
                                : Char.ToLower(replacement[i])));
  }

  return new string(chars.ToArray());
}

Python find all matching substring patterns and replace substring


By : Jaigue
Date : March 29 2020, 07:55 AM
I hope this helps you . I want to search if a sentence has particular pattern or not. Do nothing if not found. If pattern found, substitute pattern with another substring in the string. , Using regular expression:
code :
import re

def findSubString(raw_string, start_marker, end_marker):
    return re.sub(
        r'(?<={}).*?(?={})'.format(re.escape(start_marker), re.escape(end_marker)),
        lambda m: m.group().strip().replace(' ', '_'),
        raw_string)

line1 = "Who acted as `` Bruce Wayne '' in the movie `` Batman Forever '' ?"
line1 = findSubString(line1, "``", "''")
assert line1 == "Who acted as ``Bruce_Wayne'' in the movie ``Batman_Forever'' ?"
def findSubString(raw_string, start_marker, end_marker): 
    result = []
    rest = raw_string
    while True:
        head, sep, tail = rest.partition(start_marker)
        if not sep:
            break
        body, sep, tail = tail.partition(end_marker)
        if not sep:
            break
        result.append(head + start_marker + body.strip().replace(' ', '_') + end_marker)
        rest = tail
    result.append(rest)
    return ''.join(result)

Regex: Find substring starts with AND end with from a string and replace a word from substring


By : user3656118
Date : March 29 2020, 07:55 AM
hop of those help? Possible inputs:
code :
^(\S+)\s*\S*(?=,)
var re = /^(\S+)\s*\S*(?=,)/gmi; 
var str = 'Hi John, I have recently..\nhi , I have...\nHi Hans, I have...\nHi, I have...';
var subst = '$1 David'; 

var result = str.replace(re, subst);

Pandas: using str.contains and map to find some substring and replace value in column


By : Micaela Sanchez
Date : March 29 2020, 07:55 AM
it fixes the issue I have dataframe , One possible solution:
code :
#create dict for map
d = df2.set_index('url')['category'].to_dict()
print (d)
{'community.livejournal.com/psp_ru': 'Рубрики/Развлечения/Игры/Приставочныеигры/',
 'kmzpub.ru/games.asp': 'Рубрики/Hi-Tech/Программы/Софт/Игры/Универсальное/Рубрики/Hi-Tech/Программы/Софт/Игры/Универсальное/', 
 'falloutsite.ru/': 'Рубрики/Hi-Tech/Программы/Софт/Игры/', 
 'sigma-team.ru/content/view/15/19': 'Рубрики/Hi-Tech/Программы/Софт/Игры/QuakeиCounter-Strike/'}

#use list comprehension for map by substring   
print (df1.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x])) )
                                            0
0                                         NaN
1                                         NaN
2                                         NaN
3                                         NaN
4  Рубрики/Развлечения/Игры/Приставочныеигры/
5                                         NaN
6                                         NaN
7                                         NaN
8                                         NaN
df1['category']=df1.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x])) 
print (df1)
   member_id device_type                                          device_id  \
0     603609         url                                           mail.ru/   
1     603609         url                                           mail.ru/   
2     603609         url                                           mail.ru/   
3     603609         url                                           mail.ru/   
4     603609         url           mail.ru/community.livejournal.com/psp_ru   
5     603609         url  lady.mail.ru/article/491411-kurban-omarov-otve...   
6     603609         url                                           mail.ru/   
7     603609         url  lady.mail.ru/article/491411-kurban-omarov-otve...   
8     603609         url  lady.mail.ru/article/491411-kurban-omarov-otve...   

   event_type event_path                    event_duration  \
0           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
1           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
2           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
3           3         pc  7d4a095373874b4fb26a2e6d070b6ad3   
4          28         pc  7d4a095373874b4fb26a2e6d070b6ad3   
5           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
6           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
7           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   
8           0         pc  7d4a095373874b4fb26a2e6d070b6ad3   

                                     category  
0                                         NaN  
1                                         NaN  
2                                         NaN  
3                                         NaN  
4  Рубрики/Развлечения/Игры/Приставочныеигры/  
5                                         NaN  
6                                         NaN  
7                                         NaN  
8                                         NaN  
df1 = pd.DataFrame({'device_id':['a d','b s','c r'], 'b':[1,2,3]})    
df2 = pd.DataFrame({'url':['a','m','k'], 'category':['one','two','three']})    
#df2 = pd.DataFrame({'url':['a r','m','k'], 'category':['one','two','three']})    


d = df2.set_index('url')['category'].to_dict()
print (d)
{'k': 'three', 'a': 'one', 'm': 'two'}

df1['category']=df1.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x])) 
print (df1)
   b device_id category
0  1       a d      one
1  2       b s      NaN
2  3       c r      NaN

Pandas dataframe replace string in multiple columns by finding substring


By : orangec
Date : March 29 2020, 07:55 AM
around this issue You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):
code :
str_cols = ['Answer']    # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question   Answer
#0      1       A
#1      2       X
#2      3       C
str_cols = df.select_dtypes(['object']).columns
Related Posts Related Posts :
  • Python hex string encoding
  • Get week start date from week number
  • How to use imports from requirements.txt in python
  • Removing tab indent in ipython shell
  • I need to remove duplicates from a list but add the numeric value in them
  • Delay default arguments being read until function is called
  • Interpolate / fillna with a decay formula in pandas
  • What python package can translate Greek letter to ASCII requivalent?
  • How to get output of OS command from Jupyter notebook?
  • Printing AND writing the RIGHTLY formatted number
  • How do I create a shortcut to import most used python modules?
  • Matplotlib: Show selected date labels on x axis
  • Understanding memoization in Python
  • why does the len function return 2 on some iterations when they are all the same length?
  • Change in preference value does not affect the results of Affinity propagation Clustering
  • returning values inside a function
  • Why cant I use a variable in str slicing?
  • Section divider in Spyder
  • Conditional statement in selenium if element does not exists
  • Pandas : how to select index/row label in dataframe that matches a condition
  • What does zero do in A[0] in this code? Why not empty or another number?
  • Google App Engine urlfetch PayloadTooLargeError: Request exceeds 10 MiB limit for URL
  • Is there a way to set up optional arguments to bypass input arguments?
  • Suppress OpenMP debug messages when running Tensorflow on CPU
  • How to do GridSearchCV for F1-score in classification problem with scikit-learn?
  • Why does .pop() eventually stop and not keep removing items from a list until the list is empty?
  • How do I acess my Spider data from my main.py script?
  • Python Pandas Expand a Column of List of Lists to Two New Column
  • Overhead of python multiprocessing initialization is worse than benefits
  • Python Joining List and adding and removing characters
  • Adding an lxml library to project
  • Concatenating tensors in Tensorflow with None axis
  • Need help understanding why i get attribute error
  • How to force a MIDI device to report control status?
  • What does *** mean in Python -3?
  • How to get GFCC instead of MFCC in python?
  • How do I print a number n times in python?
  • How do i split a string wherever there are digits?
  • List Comprehension Python Prime numbers
  • "list index out of range" when reading data from file
  • What's the correct datetime format for the specified date string?
  • I cannot import CSV file?
  • Matplotlib pyplot plots look different after calling pandas profiling. How can I fix this?
  • Stopping all the instances of a specific region
  • Deal with Birtish summer time
  • Unable to use ColorWheel without loading kv (AttributeError)
  • What are these characters called: 。. !?etc Trying to split sentences stops working with non standard characters
  • rand.randint returning same number over and over?
  • Find longest sequence that does not contain a certain number
  • How do I convert a map object to list and also assign to a variable
  • sympy error: 'Symbol' object has no attribute 'pi'
  • How to remove words without vowels from a list in python
  • Downloading python to macbook
  • TypeError: __init__() missing 1 required positional argument: 'units'
  • Check if a class is a dataclass in Python
  • Unable to scrape google news heading via their class
  • Array of structs with dynamic allocation runs very slow in C in comparison to Python
  • Python Pandas - find all unique combinations of rows of a DataFrame without repeating values in the columns
  • How do I change the numbers in a cell to the word 'Bus' in Pandas Python
  • 'ascii' codec can't encode character : ordinal not in range (128)
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk