Processing a mainframe file using cobrix in databricks - Pyspark python 3

By : Andriy Martynets
Date : September 04 2020, 07:00 AM
To fix the issue you can do To make third-party or locally-built code available to notebooks and jobs running on your clusters, you can install a library. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.
Steps to install third party libraries:
code :

How to convert sql table into a pyspark/python data structure and return back to sql in databricks notebook

By : enceladus
Date : March 29 2020, 07:55 AM
may help you . I am running a sql notebook on databricks. I would like to analyze a table with half a billion records in it. I can run simple sql queries on the data. However, I need to change the date column type from str to date.
code :
dataFrame = sqlContext.sql('select * from myTable')

PySpark on Databricks: Reading a CSV file copied from the Azure Blob Storage results in java.io.FileNotFoundException

By : Rachel
Date : March 29 2020, 07:55 AM
it should still fix some issue I'm not certain what the file: will map to.
I would have expected the path to be a DBFS path:
code :
copy_to = "/path/file.csv"

How to read multiple nested json objects in one file extract by pyspark to dataframe in Azure databricks?

By : user3317424
Date : March 29 2020, 07:55 AM
will help you You can read it into an RDD first. It will be read as a list of strings You need to convert the json string into a native python datatype using json.loads() Then you can convert the RDD into a dataframe, and it can infer the schema directly using toDF() Using the answer from Flatten Spark Dataframe column of map/dictionary into multiple columns, you can explode the Data column into multiple columns. Given your Id column is going to be unique. Note that, explode would return key, value columns for each entry in the map type. You can repeat the 4th point to explode the properties column.
code :
import json

rdd = sc.textFile("demo_files/Test20191023.log")
df = rdd.map(lambda x: json.loads(x)).toDF()
# +--------------------+----------+--------------------+----------+
# |                Data| EventType|                  Id| Timestamp|
# +--------------------+----------+--------------------+----------+
# |[MessageTemplate ...|3735091736|event-c20b9c7eac0...|2019-03-19|
# |[MessageTemplate ...|3735091737|event-d20b9c7eac0...|2019-03-18|
# |[MessageTemplate ...|3735091738|event-e20b9c7eac0...|2019-03-17|
# +--------------------+----------+--------------------+----------+

data_exploded = df.select('Id', 'EventType', "Timestamp", F.explode('Data'))\
    .groupBy('Id', 'EventType', "Timestamp").pivot('key').agg(F.first('value'))
# There is a duplicate Id column and might cause ambiguity problems

# +--------------------+----------+----------+--------+-----+---------------+--------------------+
# |                  Id| EventType| Timestamp|      Id|Level|MessageTemplate|          Properties|
# +--------------------+----------+----------+--------+-----+---------------+--------------------+
# |event-c20b9c7eac0...|3735091736|2019-03-19|event-c2|    2|          Test1|{CorrId=d69b7489,...|
# |event-d20b9c7eac0...|3735091737|2019-03-18|event-d2|    2|          Test1|{CorrId=f69b7489,...|
# |event-e20b9c7eac0...|3735091738|2019-03-17|event-e2|    1|          Test1|{CorrId=g69b7489,...|
# +--------------------+----------+----------+--------+-----+---------------+--------------------+

Save dictionary as a pyspark Dataframe and load it - Python, Databricks

By : user3419038
Date : March 29 2020, 07:55 AM
Does that help I have a dictionary as follows: , Here is my sample code for realizing your needs step by step.
code :
my_dict = {'a':[12,15.2,52.1],'b':[2.5,2.4,5.2],'c':[1.2,5.3,12]}

import pandas as pd
pdf = pd.DataFrame(my_dict)
df = spark.createDataFrame(pdf)
df2 = spark.read.format("parquet").load('/data/tmp/my_df')
my_dict2 = df2.toPandas().to_dict()

Saving a file locally in Databricks PySpark

By : Santhosh Ranga
Date : March 29 2020, 07:55 AM
it fixes the issue cricket_007 pointed me along the right path--ultimately, I needed to save the file to the Filestore of Databricks (not just dbfs), and then save the resulting output of the xxxxx.databricks.com/file/[insert file path here] link.
My resulting code was:
