logo
down
shadow

Removing a character (^) from each row of panda Dataframe and get unique words in each row


Removing a character (^) from each row of panda Dataframe and get unique words in each row

By : God Alex
Date : October 25 2020, 07:10 PM
around this issue I want to replace '^' by ' ' (space) in each row of dataframe [df] and then find unique words in each row . , In regards to replacement, You were close.
This should work:
code :
df['Text 2'] = pd.Series(map(lambda x: str(x).replace("^"," "), df['Text']))
from collections import Counter
for row in df['Text2']:
    wordcounter = Counter(row.split())
    for w, i in wordcounter.items():
        if i == 1:
            print(w, end=' ')
    print('')
for row in df['Text2']:
    wordcounter = set(row.split())
    print(wordcounter)


Share : facebook icon twitter icon
Removing duplicates in dataset using panda's dataframe.drop_duplicate()

Removing duplicates in dataset using panda's dataframe.drop_duplicate()


By : jefferyoung2010
Date : March 29 2020, 07:55 AM
wish helps you I've decided to go a different approach in handling the duplicates using dataframes as my structure entirely. Instead of having to deal with Python's native library of reading CSVs and iterating through every row, I've decided to use pandas inbuilt read_csv method and place them in a dataframe. I then used the same method of drop_duplicate() to remove repeating data.
I hope that helps future readers.
Removing outliers using percentile in panda dataframe groupby

Removing outliers using percentile in panda dataframe groupby


By : Abdul Robinson
Date : March 29 2020, 07:55 AM
wish helps you Actually if we see,
(res.loc[ df.Transportation_Mode, 0.05] < df.Vincenty_distance.values) & (df.Vincenty_distance.values < res.loc[df.Transportation_Mode, 0.95])
code :
df.loc[ ((res.loc[ df.Transportation_Mode, 0.05] < df.Vincenty_distance.values) & (df.Vincenty_distance.values < res.loc[df.Transportation_Mode, 0.95])).values]
Panda dataframe making every unique ID number NAT

Panda dataframe making every unique ID number NAT


By : user2318393
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further Using df.duplicated() and df.loc
code :
df.loc[~df.DEVICE_ID.duplicated(),'DIFF'] = pd.NaT
>>df

   index  DEVICE_ID DIFF
0      0         12  NaT
1      1         12   20
2      2         12   30
3      3         13  NaT
4      4         13   40
5      5         13   21
6      6         14  NaT
7      7         14   10
removing multiple character combinations from words in each rows of a pandas dataframe

removing multiple character combinations from words in each rows of a pandas dataframe


By : user2986276
Date : March 29 2020, 07:55 AM
hope this fix your issue Because longer strings contains shorter strings, order is important. So loop by inverse list by [::-1] and use Series.str.extract values to new columns, then use Series.str.replace with same column.
Last use DataFrame.dot for combine all extracted values with separator to new column:
code :
remove_string = ['rn', 'rnr', 'rnrn', 'rnrnrn']

df['cleaned_txt'] = df['body']
for i in remove_string[::-1]:
    df[i] = df['cleaned_txt'].str.extract('({})'.format(i)) 
    df['cleaned_txt'] = df['cleaned_txt'].str.replace(i, '')

df['Removed_string'] = (df[remove_string].notna()
                                         .dot(pd.Index(remove_string) + ',')
                                         .str.strip(','))
df = df.drop(remove_string, axis=1)
print (df)
   ID                      body     cleaned_txt Removed_string
0   1               FITrnXS$100       FITXS$100             rn
1   2             $1000rnReason     $1000Reason             rn
2   3                      rnIf              If             rn
3   4          bevlauedrnrnnext    bevlauednext           rnrn
4   5  obccrnrnnoncrnrnactionrn  obccnoncaction        rn,rnrn
5   6          rnrnnotification    notification           rnrn
6   7               insdrnrnnon         insdnon           rnrn
7   8               rnrnupdated         updated           rnrn
8   9                  rnreason           eason            rnr
9  10                 rnrnrnLOR             LOR         rnrnrn
remove_string = ['rn', 'rnr', 'rnrn', 'rnrnrn']

df['cleaned_txt'] = df['body']
for i in remove_string[::-1]:
    df[i] = df['cleaned_txt'].str.extract('({})'.format(i)) 
    df['cleaned_txt'] = df['cleaned_txt'].str.replace(i, ' ')

df['Removed_string'] = (df[remove_string].notna()
                                         .dot(pd.Index(remove_string) + ',')
                                         .str.strip(','))
df = df.drop(remove_string, axis=1)
print (df)

   ID                      body        cleaned_txt Removed_string
0   1               FITrnXS$100         FIT XS$100             rn
1   2             $1000rnReason       $1000 Reason             rn
2   3                      rnIf                 If             rn
3   4          bevlauedrnrnnext      bevlaued next           rnrn
4   5  obccrnrnnoncrnrnactionrn  obcc nonc action         rn,rnrn
5   6          rnrnnotification       notification           rnrn
6   7               insdrnrnnon           insd non           rnrn
7   8               rnrnupdated            updated           rnrn
8   9                  rnreason              eason            rnr
9  10                 rnrnrnLOR                LOR         rnrnrn
#dictioanry for replace
remove_string = {"rn":" ", "rnr":"\n", "rnrn":"\n", "rnrnrn":"\n"}

#sorting by keys for list of tuples 
rem = sorted(remove_string.items(), key=lambda s: len(s[0]), reverse=True)
print (rem)
[('rnrnrn', '\n'), ('rnrn', '\n'), ('rnr', '\n'), ('rn', ' ')]

df['cleaned_txt'] = df['body']
for i, j in rem:
    df[i] = df['cleaned_txt'].str.extract('({})'.format(i)) 
    df['cleaned_txt'] = df['cleaned_txt'].str.replace(i, j)

cols = list(remove_string.keys())
df['Removed_string'] = (df[cols].notna().dot(pd.Index(cols) + ',')
                                        .str.strip(','))
df = df.drop(remove_string, axis=1)
print (df)
   ID                      body          cleaned_txt Removed_string
0   1               FITrnXS$100           FIT XS$100             rn
1   2             $1000rnReason         $1000 Reason             rn
2   3                      rnIf                   If             rn
3   4          bevlauedrnrnnext       bevlaued\nnext           rnrn
4   5  obccrnrnnoncrnrnactionrn  obcc\nnonc\naction         rn,rnrn
5   6          rnrnnotification       \nnotification           rnrn
6   7               insdrnrnnon            insd\nnon           rnrn
7   8               rnrnupdated            \nupdated           rnrn
8   9                  rnreason              \neason            rnr
9  10                 rnrnrnLOR                \nLOR         rnrnrn
Removing square brackets from panda dataframe

Removing square brackets from panda dataframe


By : user3066207
Date : March 29 2020, 07:55 AM
With these it helps I have a dataframe that I need to remove the square brackets from in order to input into a loop. , One simple way is to transform the list into a str:
code :
x = [
    [[14,38,51,65,84,85]],
    [[3,34,58,65,66,75]],
    [[3,15,68,70,80,82]],
    [[19,31,42,50,54,97]],
    [[4,9,48,62,74,77]],
]

m2 = pd.DataFrame(x)
m2[0] = m2[0].apply(lambda x: ','.join([str(i) for i in x]))

m2
Out[1]:
        0
0      '14,38,51,65,84,85'
1      '3,34,58,65,66,75'
2      '3,15,68,70,80,82'
3      '19,31,42,50,54,97'
4       '4,9,48,62,74,77'
from ast import literal_eval

x = [
    ['[14,38,51,65,84,85]'],
    ['[3,34,58,65,66,75]'],
    ['[3,15,68,70,80,82]'],
    ['[19,31,42,50,54,97]'],
    ['[4,9,48,62,74,77]'],
]

m2 = pd.DataFrame(x)

m2[0] = m2[0].apply(lambda x: ','.join([str(i) for i in literal_eval(x)]))
m2
Out[1]:
        0
0      '14,38,51,65,84,85'
1      '3,34,58,65,66,75'
2      '3,15,68,70,80,82'
3      '19,31,42,50,54,97'
4       '4,9,48,62,74,77'
Related Posts Related Posts :
  • Percent signs in windows path
  • How to add a random number to a subsection of a numpy array?
  • How to generate all the values of an iterable besides the last few?
  • Searching by both class and range in XPath
  • Python code execution in Perl interpreter
  • Best Way to Include Variable in Python3
  • Serialize the @property methods in a Python class
  • What is the most platform- and Python-version-independent way to make a fast loop for use in Python?
  • Good way to edit the previous defined class in ipython
  • Bounced email on Google App Engine
  • Search jpeg files using python
  • Dynamically create class attributes
  • python unichr problem
  • Python beginner, strange output problem
  • Python: Finding a value in 1 list and finding that corresponding index in another list
  • can't install mysqlclient on mac os x mojave
  • Error indicates flattened dimensions when loading pre-trained network
  • how to replace underlines with words?
  • Adding through iteration
  • Use OpenCV on deployed Flask app (Heroku)
  • How to skip interstitial in a django view if a user hits the back button?
  • Any Naive Bayesian Classifier in python?
  • Python 2.5.2: remove what found between two lines that contain two concrete strings
  • Python 2.5.2 script that add "The function starts here" to all the functions of the files of a directory
  • HttpResponseRedirect question
  • Python socket error on UDP data receive. (10054)
  • Encoding issues with cloud ml
  • numpy.where - Weird behaviour: new elements spawning from nowhere?
  • I can't move my player in pygame, can you figure why?
  • Weird error I receive from Tkinter in Python
  • Using a Zapier Custom Request Webhook with JSON Web Tokens
  • Keras: Use categorical_crossentropy without one-hot encoded array of targets
  • Does python's httplib.HTTPConnection block?
  • Do alternate python implementation version numbers imply that they provide the same syntax?
  • Searching for specific HTML string using Python
  • python sax error "junk after document element"
  • MySql: How to know if an entry is compressed or not
  • Return a list of imported Python modules used in a script?
  • Returning a list in this recursive coi function in python
  • Python2.6 Decimal to Octal
  • Appengine Apps Vs Google bot web crawler
  • Changing models in django results in broken database?
  • Global variable functions
  • Using lambda inside a function
  • How to open a file, replace some strings, and save updates to the same file?
  • How to move the beginning of an input to the and?
  • If else fill variable if empty list
  • Pandas: Find and print all floats in column
  • sqlite3.OperationalError: database is locked - non-threaded application
  • How to implement mib module in net-snmp with python?
  • Does Python/Scipy have a firls( ) replacement (i.e. a weighted, least squares, FIR filter design)?
  • sorl-thumbnail and file renaming
  • Python -- what is NOT in 2.7 that IS in 3.1? So many things have been back-ported, what is NOT?
  • How to make a Django model fields calculated at runtime?
  • Django - Threading in views without hanging the server
  • Python: Why is my POST requests not working?
  • Tried to add a value to a while condition, but it doesn't go back
  • How do I exit a while-true loop after 5 tries?
  • win python3 Multithreading
  • Compare 2 dictionaries in python
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk