logo
Tags down

shadow

pandas DataFrame: get cells in column that are NaN, None, empty string/list, etc


By : Irie Blue
Date : August 23 2020, 06:00 AM
Any of those help One idea is chain Series.isna with compare lengths by Series.str.len:
code :
df = pd.DataFrame({
         'a':[None,np.nan,[],'','aa', 0],
})

m = df['a'].isna() | df['a'].str.len().eq(0)
print (m)
0     True
1     True
2     True
3     True
4    False
5    False
Name: a, dtype: bool


Share : facebook icon twitter icon

Cells in a column of pandas dataframe are individual list. dtype list is not working


By : tvg
Date : March 29 2020, 07:55 AM
wish of those help Well this is not the correct way but I found a hack. Use graphlab sframe to read the column with dtype as list and then convert that sframe to dataframe with .to_dataframe command.

pandas initialize dataframe column cells as empty lists


By : HG Macke
Date : March 29 2020, 07:55 AM
around this issue Since you are looking for time efficiency, below some benchmarks. I think list comprehension is already quite fast to create the empty list of list objects, but you can squeeze out a marginal improvement using itertools.repeat. On the insert piece, apply is 3x slower because it loops:
code :
import pandas as pd
from itertools import repeat
df = pd.DataFrame({"A":np.arange(100000)})

%timeit df['some_col'] = [[] for _ in range(len(df))]
100 loops, best of 3: 8.75 ms per loop

%timeit df['some_col'] = [[] for i in repeat(None, len(df))]
100 loops, best of 3: 8.02 ms per loop

%%timeit 
df['some_col'] = ''
df['some_col'] = df['some_col'].apply(list)
10 loops, best of 3: 25 ms per loop

Count non-empty cells in pandas dataframe rows and add counts as a column


By : 董董大器
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further To count the number of cells missing data in each row, you probably want to do something like this:
code :
df.apply(lambda x: x.isnull().sum(), axis='columns')
df['MISSING'] = df.apply(lambda x: x.isnull().sum(), axis='columns')
df = df[['Count', 'M', 'A', 'B', 'C']]
df = pd.read_excel('count.xlsx', na_values=['', ' '])
df.head() # You should see NaN for empty cells
df['M']=df.apply(lambda x: x.isnull().sum(), axis='columns')
df.head() # Column M should report the values: first row: 0, second row: 1, third row: 2
df = df[['Count', 'M', 'A', 'B', 'C']]
df.head() # Column order should be Count, M, A, B, C

Add new column in pandas dataframe using empty string or the value from column A depending on the value on column B


By : Alize
Date : March 29 2020, 07:55 AM
Hope that helps just use .loc with pandas conditions to assign just the rows you need:
code :
df.loc[df['price_if_0005'] == 0, 'label'] = df['price']
import pandas as pd
from io import StringIO

s = """
         price |   tpo_count | tpo             |   price_if_0005 
   0 |  1.4334 |           1 | n               |          0.0004 
   1 |  1.4335 |           1 | n               |          0      
   2 |  1.4336 |           1 | n               |          0.0001 
   3 |  1.4337 |           1 | n               |          0.0002 
   4 |  1.4338 |           1 | n               |          0.0003 
   5 |  1.4339 |           1 | n               |          0.0004 
   6 |  1.434  |           1 | n               |          0      
   7 |  1.4341 |           1 | n               |          0.0001 
   8 |  1.4342 |           3 | noq             |          0.0002 
   9 |  1.4343 |           3 | noq             |          0.0003 
  10 |  1.4344 |           3 | noq             |          0.0004 """

df = pd.read_csv(StringIO(s), sep="\s+\|\s+")
df.loc[df['price_if_0005'] == 0, 'label'] = df['price']
df['label'].fillna('',inplace=True)
print(df)
     price  tpo_count  tpo  price_if_0005   label
0   1.4334          1    n         0.0004        
1   1.4335          1    n         0.0000  1.4335
2   1.4336          1    n         0.0001        
3   1.4337          1    n         0.0002        
4   1.4338          1    n         0.0003        
5   1.4339          1    n         0.0004        
6   1.4340          1    n         0.0000   1.434
7   1.4341          1    n         0.0001        
8   1.4342          3  noq         0.0002        
9   1.4343          3  noq         0.0003        
10  1.4344          3  noq         0.0004        

Iterate through a pandas dataframe column and eval with an if statement and pass the column values to an empty list/dict


By : user3602044
Date : March 29 2020, 07:55 AM
I hope this helps you . You cannot access ['TRAINSET'] by using index like you are doing.
By writing ['TRAINSET'] you are just creating a list having 'TRAINSET' string as an element in it.
code :
columnSeriesObj = df['TRAINSET'].tolist()
list1 = []
for i in range(len(columnSeriesObj)):
    if columnSeriesObj[i] == columnSeriesObj[i-1] and columnSeriesObj[i] not in list1:
        list1.append(columnSeriesObj[i])
Related Posts Related Posts :
  • Why is the interpreter call the variable i a local variable
  • Passing multiple list in a function as *args gives a None result
  • Getting a tclerror with PhotoIMage
  • How to efficiently disaggregate data from?
  • Group by the dates to weeks
  • Accuracy problems in estimating pi using Machin's method
  • Printing a list method return None
  • how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3
  • Can I install python 3.7 in ubuntu 18.04 without having python 3.6 in the system?
  • Applying a function to every cell of dataframes
  • Cant install allennlp with pip on mac
  • ModuleNotFoundError: No module named 'virtualenv' Exiting due to failure, even after virtual environment is successfully
  • How to fix " 'int' object is not subscriptable" on this code
  • question about custom sorting using key argument in sorted()
  • Python3-tk is already installed but python3.7 can't find module tkinter
  • Pickle messing up text
  • How to install torch==0.3.1 in python=3.6
  • Tkinter Checkbuttons' values won't change
  • How to call asynchronous functions without expecting returns from them?
  • Unable to convert string to date (Portuguese locale)
  • Use textract on PDF file located on Google Cloud Storage
  • How to fix 'Can't open libmsodbcsql-17.3.so.1.1'
  • Using the join method in python - confusing error
  • Pandas Dataframe to .csv file
  • Tell if an object's attribute has been used
  • Read files from Cloud Storage having definite prefix but random postfix
  • Extract Button link text from a website python selenium
  • Reverse string, but not integers
  • TkFiledialog.askopenfilename() launches a window for "save as" in windows 10
  • Printing last 3 lines of a .csv file
  • I can't install python packages by using pip3 on alpine
  • Setting up a Flask app that uses headed Selenium on a Ubuntu 18.04 LTS Server
  • How to convert month name to month number in a timeseries in DataFrame?
  • Why does pandas.where() returning 'None'
  • How to extract matching keywords from two columns in a pandas dataframe?
  • python converting a List of Tuples into a Dict with external keys
  • How to fix ModuleNotFoundError: No module named 'pip._internal' with python source code installation
  • Pytorch RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got CUDAType instead
  • Covert a dataframe into a matrix form
  • i am webscraping with bs4 and the urls wont show up
  • docker build: Returned a non-zero code: 5
  • Pandas read_csv from FileStorage in Flask
  • How do I add elements of a set and print their sum?
  • Is there a way to add a column to a geopandas dataframe using a single value geoseries?
  • Issue with appending to an array
  • no module named "tensorflow.python.platform" when importing tensorflow || tflearn on python shell
  • Accesing "Next" page with scrapy rules
  • How to take all combination of a pandas dataframe (choosing 2 at a time) and make a new dataframe with each two combinat
  • Connecting the missing pixels
  • Returns Nothing [] from Google + API using Python
  • pd.DataFrame: adding values in specific locations
  • Validating phone numbers in python using RE
  • How to fetch all data of solr which contains 40k rows into csv?
  • Inheritance of modules in Python?
  • ModuleNotFoundError: No module named 'frontend'
  • Only One Pod is consuming all the computing resource although specified the limits and requests resources in pod templat
  • IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
  • Scraping 'next' page after finishing in the main one using Rules
  • Add custom headers to SOAP request using zeep.Client Python
  • It the any proper way how to take a two byte elemetns from list, concat them, and convert them to integer
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk