logo
Tags down

shadow

How to create multiple dataframe using same case class


By : Aleš Pernikář
Date : October 17 2020, 06:10 AM
around this issue You can't create two Dataframe using single case class with the same number of columns directly. Assume you have the below case class FlightData. If you created a Dataframe from this case class it will contains 3 columns. However, you could create two Dataframe but in the next one you can select some column from this case class. If you have two different file and every file contains different structure you need to create two separated case class.
code :
   val someData = Seq(
    Row("United States", "Romania", 15),
    Row("United States", "Croatia", 1),
    Row("United States", "Ireland", 344),
    Row("Egypt", "United States", 15)
  )


  val flightDataSchema = List(
    StructField("DEST_COUNTRY_NAME", StringType, true),
    StructField("ORIGIN_COUNTRY_NAME", StringType, true),
    StructField("count", IntegerType, true)
  )

  case class FlightData(DEST_COUNTRY_NAME: String, ORIGIN_COUNTRY_NAME: String, count: Int)
  import spark.implicits._

  val dataDS = spark.createDataFrame(
    spark.sparkContext.parallelize(someData),
    StructType(flightDataSchema)
  ).as[FlightData]

  val dataDS_2 = spark.createDataFrame(
    spark.sparkContext.parallelize(someData),
    StructType(flightDataSchema)
  ).as[FlightData].select('DEST_COUNTRY_NAME)


Share : facebook icon twitter icon

Unable to create dataframe from RDD of Row using case class


By : Eduardo Campos
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further I am not sure if it is a bug or not but mixing dynamically typed Row, case classes and explicit schema doesn't make much sense. Either use Rows and schema:
code :
import collection.mutable._
import collection.JavaConverters._

spark.createDataFrame(ArrayBuffer(Row(Row(0L, 0))).asJava, SCHEMA)
import spark.implicits._

Seq(Tuple1(Timestamp(0L, 0))).toDF("created_at")
case class Record(created_at: Option[Timestamp])
case class Timestamp(seconds: Long, nanos: Option[Int])

Seq(Record(Some(Timestamp(0L, Some(0))))).toDF

Create nested case class instance from a DataFrame


By : Drjslab
Date : March 29 2020, 07:55 AM
it fixes the issue If you have data like nodeid|timestamp|value in your DB (yes, according to schema), you can't directly map it into structure that you createdread data from table as pair RDD:
code :
val data = sc.cassandraTable[(String,String,Option[String])]("test", "taghistory")
     .select("nodeid","timestamp","value").keyBy[String]("nodeid")
val grouped = data.groupByKey.map{case (k,v) => Inline_response_200(k,
       v.map(x => ReadingsByEpoch_data(x._2, x._3)).toList)}
grouped.collect 

How to create DataFrame not using Case Class?


By : anil katta
Date : March 29 2020, 07:55 AM
I wish this help you One way is to use spark csv package to read the files directly and create dataframe. Package will directly infer the schema from the header if your file has a header or you can create a custom schema using struct type.
In the below example , i have created a custom schema.

How to create Dataset (not DataFrame) without using case class but using StructType?


By : ali2173
Date : March 29 2020, 07:55 AM
I wish this helpful for you If you know how to create DataFrame, you already now how to create Dataset :)

Create DataFrame from case class


By : Exildur
Date : March 29 2020, 07:55 AM
I wish this help you There is no issue in the piece of code you copied from the link shared, as error explains it's something else (exact code copy result in my run below).
code :
case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)
val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
val Employee_DataFrame = EmployeesData.toDF
Employee_DataFrame.show()
+----+---+-----------------+------+-------+
|Name|Age|      Designation|Salary|ZipCode|
+----+---+-----------------+------+-------+
|Anto| 21|Software Engineer|  2000|  56798|
+----+---+-----------------+------+-------+
Related Posts Related Posts :
  • Understanding the diagrams of Product and Coproduct
  • Wrapping string interpolator macro in another
  • How to change "hostname" while running sbt run?
  • How can I get random data generated for scala case classes with the ability to "change some values" for unit t
  • compare case class fields with sub fields of another case class in scala
  • throw exception does not work inside future.map scala?
  • Scala mockito: Delay Mockito.when().thenReturn(someFuture)
  • Lagom: is number of read side shards set per node or within the whole cluster?
  • How can I modify this Ordering?
  • reduce RDD having key as (String,String)
  • Accessibility of scala constructor parameters
  • list Pattern Matching to return a new list of every other element
  • How to Sorting results after run code in Spark
  • scala filter list of object if some fields in the object are same
  • Is there a way New->Scala Class in intellij can default to creating a case class (vs a regular class)?
  • Stop the fs2-stream after a timeout
  • Converting Case Classes with params as Case Classes to Avro Message to send to Kafka
  • Sum of an array elements through while loop
  • Scala: Using Trait + Companion object to enumerate implementations
  • Which tools can I use to benchmark a scala code?
  • can we declare variables and use them in for loop in scala
  • Bind columns of 2 different dataframes spark
  • How to manage the hierarchy of State in Functional Programming?
  • Sorting List of values in a RDD in Scala
  • Decreasing the compilation time when using shapeless HList
  • How to add the schema in a dataframe from a config file
  • scala - mock function and replace implementation in tests
  • Scala: no-name parameters in function with List and Option
  • How to obtain class type from case class value
  • How do I append an element to a list in Scala
  • How beneficial is Parallel Seq for executing sequence of statements?
  • How can I partition a RDD respecting order?
  • How to extract latest/recent partition from the list of year month day partition columns
  • Fs2 Stream.Compiler is not found (could not find implicit value Compiler[[x]F[x],G])
  • Can you mock a value rather than a method?
  • PureConfig ConfigLoader in Scala
  • Scala naming convention for Futures
  • case class inheriting another class/trait
  • what is the optimal way to show differences between two data sets
  • Is it safe to catch an Exception object
  • base64 decoding of a dataframe
  • Identifying object fields with strings in Scala
  • ScalaTest can't verify mock function invocations inside Future
  • Modify keys in Spray JSON
  • What is value of '_.nextInt' in this expression in Scala
  • Histogram for RDD in Scala?
  • Why there is a different error message on using += and a=x+y on a val variable?
  • Tail recursion and call by name / value
  • How to validate Date Column of dateframe
  • How can I get an empty collection of the same type as a given instance?
  • When to do .repartition(Int AnyValue) in Spark, right after reading the Parquet (or) after running computations on that
  • Databricks: Dataframe groupby agg, collector set include duplicate values
  • Import Scala object based on value of commandline argument
  • How to get the type parameters from an abstract class that is extended by an object
  • How can i check for empty values on spark Dataframe using User defined functions
  • Scala Tuple2Zipped vs IterableLike zip
  • Split one row into multiple rows of dataframe
  • How to divide values of two columns with another name in sqlcontext?
  • Is it possible to have a generic logging filter in finagle that can be "inserted anywhere" in a chain of andTh
  • How to sort data in Scala?
  • shadow
    Privacy Policy - Terms - Contact Us © 35dp-dentalpractice.co.uk