Weighted Sampling, Tidyr Verbs, Strong Scaler, RAPIDS, and extra

0
1

55ad

55ad

55ad sparklyr 55ad 1.4 is now obtainable 55ad on 55ad CRAN 55ad ! To put in 55ad sparklyr 55ad 1.4 from CRAN, run

55ad

55ad On this weblog publish, we 55ad are going to showcase the 55ad next much-anticipated new functionalities from 55ad the 55ad sparklyr 55ad 1.4 launch:

55ad

55ad Parallelized Weighted Sampling

55ad

55ad Readers acquainted with 55ad dplyr::sample_n() 55ad and 55ad dplyr::sample_frac() 55ad capabilities might have seen 55ad that each of them assist 55ad weighted-sampling use instances on R 55ad dataframes, e.g.,

55ad

55ad

55ad

 55ad dplyr 55ad :: 55ad sample_n 55ad ( 55ad mtcars 55ad , measurement  55ad = 55ad   55ad 3 55ad , weight  55ad = 55ad   55ad mpg 55ad , substitute  55ad = 55ad   55ad FALSE 55ad )

55ad

55ad

55ad

 55ad       55ad       55ad       55ad mpg cyl  disp   55ad hp drat     55ad wt  qsec vs am  55ad gear carb
Fiat 128    55ad    32.4   55ad  4  78.7   55ad 66 4.08 2.200 19.47   55ad 1  1    55ad  4     55ad 1
Merc 280C     55ad  17.8   6  55ad 167.6 123 3.92 3.440 18.90  55ad  1  0   55ad   4    55ad  4
Mazda RX4 Wag 21.0  55ad   6 160.0 110  55ad 3.90 2.875 17.02  0  55ad  1     55ad 4    4

55ad

55ad and

55ad

55ad

55ad

 55ad dplyr 55ad :: 55ad sample_frac 55ad ( 55ad mtcars 55ad , measurement  55ad = 55ad   55ad 0.1 55ad , weight  55ad = 55ad   55ad mpg 55ad , substitute  55ad = 55ad   55ad FALSE 55ad )

55ad

55ad

55ad

 55ad       55ad       55ad    mpg cyl  55ad  disp  hp drat  55ad    wt   55ad qsec vs am gear carb
Honda  55ad Civic 30.4   4  55ad  75.7  52 4.93  55ad 1.615 18.52  1   55ad 1    4  55ad    2
Merc 450SE  55ad  16.4   8  55ad 275.8 180 3.07 4.070 17.40  55ad  0  0   55ad   3    55ad  3
Fiat X1-9    55ad 27.3   4   55ad 79.0  66 4.08 1.935  55ad 18.90  1  1  55ad    4   55ad   1

55ad

55ad will choose some random subset 55ad of 55ad mtcars 55ad utilizing the 55ad mpg 55ad attribute because the sampling 55ad weight for every row. If 55ad 55ad substitute = FALSE 55ad is about, then a 55ad row is faraway from the 55ad sampling inhabitants as soon as 55ad it will get chosen, whereas 55ad when setting 55ad substitute = TRUE 55ad , every row will at 55ad all times keep within the 55ad sampling inhabitants and could be 55ad chosen a number of occasions.

55ad

55ad Now the very same use 55ad instances are supported for Spark 55ad dataframes in 55ad sparklyr 55ad 1.4! For instance:

55ad

55ad

55ad

 55ad library 55ad ( 55ad sparklyr 55ad ) 55ad 
 55ad 
 55ad sc 55ad   55ad <- 55ad   55ad spark_connect 55ad ( 55ad grasp  55ad = 55ad   55ad "native" 55ad ) 55ad 
 55ad mtcars_sdf 55ad   55ad <- 55ad   55ad copy_to 55ad ( 55ad sc 55ad ,  55ad mtcars 55ad , repartition  55ad = 55ad   55ad 4L 55ad ) 55ad 
 55ad 
 55ad dplyr 55ad :: 55ad sample_n 55ad ( 55ad mtcars_sdf 55ad , measurement  55ad = 55ad   55ad 5 55ad , weight  55ad = 55ad   55ad mpg 55ad , substitute  55ad = 55ad   55ad FALSE 55ad )

55ad

55ad

55ad

55ad will return a random subset 55ad of measurement 5 from the 55ad Spark dataframe 55ad mtcars_sdf 55ad .

55ad

55ad Extra importantly, the sampling algorithm 55ad carried out in 55ad sparklyr 55ad 1.4 is one thing 55ad that matches completely into the 55ad MapReduce paradigm: as we’ve cut 55ad up our 55ad mtcars 55ad information into 4 partitions 55ad of 55ad mtcars_sdf 55ad by specifying 55ad repartition = 4L 55ad , the algorithm will first 55ad course of every partition independently 55ad and in parallel, choosing a 55ad pattern set of measurement as 55ad much as 5 from every, 55ad after which cut back all 55ad 4 pattern units right into 55ad a remaining pattern set of 55ad measurement 5 by selecting data 55ad having the highest 5 highest 55ad sampling priorities amongst all.

55ad

55ad How is such parallelization doable, 55ad particularly for the sampling with 55ad out alternative situation, the place 55ad the specified result’s outlined as 55ad the result of a sequential 55ad course of? An in depth 55ad reply to this query is 55ad in 55ad this weblog publish 55ad , which features a definition 55ad of the issue (particularly, the 55ad precise that means of sampling 55ad weights in time period of 55ad chances), a high-level clarification of 55ad the present answer and the 55ad motivation behind it, and in 55ad addition, some mathematical particulars all 55ad hidden in a single hyperlink 55ad to a PDF file, in 55ad order that non-math-oriented readers can 55ad get the gist of the 55ad whole lot else with out 55ad getting scared away, whereas math-oriented 55ad readers can take pleasure in 55ad understanding all of the integrals 55ad themselves earlier than peeking on 55ad the reply.

55ad

55ad Tidyr Verbs

55ad

55ad The specialised implementations of the 55ad next 55ad tidyr 55ad verbs that work effectively 55ad with Spark dataframes have been 55ad included as a part of 55ad 55ad sparklyr 55ad 1.4:

55ad

55ad We will reveal how these 55ad verbs are helpful for tidying 55ad information by some examples.

55ad

55ad Let’s say we’re given 55ad mtcars_sdf 55ad , a Spark dataframe containing 55ad all rows from 55ad mtcars 55ad plus the identify of 55ad every row:

55ad

 55ad # Supply: spark<?> [?? x  55ad 12]
  mannequin    55ad       55ad   mpg    55ad cyl  disp    55ad  hp  drat   55ad   wt  qsec  55ad    vs   55ad   am  gear  55ad  carb
  <chr>   55ad       55ad  <dbl> <dbl> <dbl> <dbl>  55ad <dbl> <dbl> <dbl> <dbl> <dbl>  55ad <dbl> <dbl>
1 Mazda RX4   55ad    21   55ad       55ad 6   160   55ad  110  3.9   55ad  2.62  16.5   55ad    0   55ad    1   55ad    4   55ad    4
2 Mazda  55ad RX4 W…  21   55ad       55ad 6   160   55ad  110  3.9   55ad  2.88  17.0   55ad    0   55ad    1   55ad    4   55ad    4
3 Datsun  55ad 710    22.8  55ad     4  55ad   108    55ad  93  3.85   55ad 2.32  18.6    55ad   1    55ad   1    55ad   4    55ad   1
4 Hornet 4  55ad Dr…  21.4    55ad   6    55ad 258   110   55ad 3.08  3.22  19.4  55ad     1  55ad     0  55ad     3  55ad     1
5  55ad Hornet Spor…  18.7   55ad    8   55ad  360   175  55ad  3.15  3.44   55ad 17.0      55ad 0      55ad 0      55ad 3      55ad 2
# … with extra rows

55ad

55ad and we wish to flip 55ad all numeric attributes in 55ad mtcar_sdf 55ad (in different phrases, all 55ad columns aside from the 55ad mannequin 55ad column) into key-value pairs 55ad saved in 2 columns, with 55ad the 55ad key 55ad column storing the identify 55ad of every attribute, and the 55ad 55ad worth 55ad column storing every attribute’s 55ad numeric worth. One solution to 55ad accomplish that with 55ad tidyr 55ad is by using the 55ad 55ad tidyr::pivot_longer 55ad performance:

55ad

55ad

55ad

 55ad mtcars_kv_sdf 55ad   55ad <- 55ad   55ad mtcars_sdf 55ad   55ad %>% 55ad 
 55ad    55ad tidyr 55ad :: 55ad pivot_longer 55ad ( 55ad cols  55ad = 55ad   55ad - 55ad mannequin 55ad , names_to  55ad = 55ad   55ad "key" 55ad , values_to  55ad = 55ad   55ad "worth" 55ad ) 55ad 
 55ad print 55ad ( 55ad mtcars_kv_sdf 55ad , n  55ad = 55ad   55ad 5 55ad )

55ad

55ad

55ad

 55ad # Supply: spark<?> [?? x  55ad 3]
  mannequin    55ad   key    55ad worth
  <chr>    55ad   <chr> <dbl>
1 Mazda  55ad RX4 am     55ad   1
2 Mazda RX4  55ad carb    4
3  55ad Mazda RX4 cyl    55ad   6
4 Mazda RX4  55ad disp  160
5 Mazda RX4  55ad drat    3.9
#  55ad … with extra rows

55ad

55ad To undo the impact of 55ad 55ad tidyr::pivot_longer 55ad , we will apply 55ad tidyr::pivot_wider 55ad to our 55ad mtcars_kv_sdf 55ad Spark dataframe, and get 55ad again the unique information that 55ad was current in 55ad mtcars_sdf 55ad :

55ad

55ad

55ad

 55ad tbl 55ad   55ad <- 55ad   55ad mtcars_kv_sdf 55ad   55ad %>% 55ad 
 55ad    55ad tidyr 55ad :: 55ad pivot_wider 55ad ( 55ad names_from  55ad = 55ad   55ad key 55ad , values_from  55ad = 55ad   55ad worth 55ad ) 55ad 
 55ad print 55ad ( 55ad tbl 55ad , n  55ad = 55ad   55ad 5 55ad )

55ad

55ad

55ad

 55ad # Supply: spark<?> [?? x  55ad 12]
  mannequin    55ad       55ad  carb   cyl  55ad  drat     55ad hp   mpg   55ad   vs    55ad  wt     55ad am  disp  gear  55ad  qsec
  <chr>   55ad       55ad  <dbl> <dbl> <dbl> <dbl>  55ad <dbl> <dbl> <dbl> <dbl> <dbl>  55ad <dbl> <dbl>
1 Mazda RX4   55ad       55ad  4     55ad  6  3.9   55ad   110  21  55ad       55ad  0  2.62   55ad    1   55ad 160      55ad  4  16.5
2 Hornet  55ad 4 Dr…     55ad  1     55ad  6  3.08   55ad  110  21.4   55ad    1   55ad 3.22      55ad 0  258    55ad    3   55ad 19.4
3 Hornet Spor…    55ad   2    55ad   8  3.15  55ad   175  18.7  55ad     0  55ad  3.44     55ad  0  360   55ad     3  55ad  17.0
4 Merc 280C   55ad       55ad  4     55ad  6  3.92   55ad  123  17.8   55ad    1   55ad 3.44      55ad 0  168.    55ad   4  18.9
5  55ad Merc 450SLC     55ad   3    55ad   8  3.07  55ad   180  15.2  55ad     0  55ad  3.78     55ad  0  276.   55ad    3   55ad 18
# … with extra rows

55ad

55ad One other solution to cut 55ad back many columns into fewer 55ad ones is by utilizing 55ad tidyr::nest 55ad to maneuver some columns 55ad into nested tables. As an 55ad illustration, we will create a 55ad nested desk 55ad perf 55ad encapsulating all performance-related attributes 55ad from 55ad mtcars 55ad (specifically, 55ad hp 55ad , 55ad mpg 55ad , 55ad disp 55ad , and 55ad qsec 55ad ). Nevertheless, in contrast to 55ad R dataframes, Spark Dataframes shouldn’t 55ad have the idea of nested 55ad tables, and the closest to 55ad nested tables we will get 55ad is a 55ad perf 55ad column containing named structs 55ad with 55ad hp 55ad , 55ad mpg 55ad , 55ad disp 55ad , and 55ad qsec 55ad attributes:

55ad

55ad

55ad

 55ad mtcars_nested_sdf 55ad   55ad <- 55ad   55ad mtcars_sdf 55ad   55ad %>% 55ad 
 55ad    55ad tidyr 55ad :: 55ad nest 55ad ( 55ad perf  55ad = 55ad   55ad c 55ad ( 55ad hp 55ad ,  55ad mpg 55ad ,  55ad disp 55ad ,  55ad qsec 55ad ) 55ad )

55ad

55ad

55ad

55ad We will then examine the 55ad kind of 55ad perf 55ad column in 55ad mtcars_nested_sdf 55ad :

55ad

55ad

55ad

 55ad sdf_schema 55ad ( 55ad mtcars_nested_sdf 55ad ) 55ad $ 55ad perf 55ad $ 55ad kind

55ad

55ad

55ad

 55ad [1] "ArrayType(StructType(StructField(hp,DoubleType,true), StructField(mpg,DoubleType,true), StructField(disp,DoubleType,true), StructField(qsec,DoubleType,true)),true)"

55ad

55ad and examine particular person struct 55ad parts inside 55ad perf 55ad :

55ad

55ad

55ad

 55ad perf 55ad   55ad <- 55ad   55ad mtcars_nested_sdf 55ad   55ad %>% 55ad   55ad dplyr 55ad :: 55ad pull 55ad ( 55ad perf 55ad ) 55ad 
 55ad unlist 55ad ( 55ad perf 55ad [[ 55ad 1 55ad ] 55ad ] 55ad )

55ad

55ad

55ad

 55ad     hp  55ad    mpg   55ad  disp   qsec
110.00  55ad  21.00 160.00  16.46

55ad

55ad Lastly, we will additionally use 55ad 55ad tidyr::unnest 55ad to undo the consequences 55ad of 55ad tidyr::nest 55ad :

55ad

55ad

55ad

 55ad mtcars_unnested_sdf 55ad   55ad <- 55ad   55ad mtcars_nested_sdf 55ad   55ad %>% 55ad 
 55ad    55ad tidyr 55ad :: 55ad unnest 55ad ( 55ad col  55ad = 55ad   55ad perf 55ad ) 55ad 
 55ad print 55ad ( 55ad mtcars_unnested_sdf 55ad , n  55ad = 55ad   55ad 5 55ad )

55ad

55ad

55ad

 55ad # Supply: spark<?> [?? x  55ad 12]
  mannequin    55ad       55ad   cyl  drat  55ad    wt   55ad   vs    55ad  am  gear   55ad carb    hp  55ad   mpg  disp  55ad  qsec
  <chr>   55ad       55ad  <dbl> <dbl> <dbl> <dbl>  55ad <dbl> <dbl> <dbl> <dbl> <dbl>  55ad <dbl> <dbl>
1 Mazda RX4   55ad       55ad  6  3.9   55ad  2.62     55ad  0     55ad  1     55ad  4     55ad  4   110  55ad  21     55ad 160   16.5
2 Hornet  55ad 4 Dr…     55ad  6  3.08   55ad 3.22      55ad 1      55ad 0      55ad 3      55ad 1   110   55ad 21.4  258    55ad 19.4
3 Duster 360    55ad     8  55ad  3.21  3.57   55ad    0   55ad    0   55ad    3   55ad    4   55ad  245  14.3   55ad 360   15.8
4 Merc  55ad 280      55ad     6  55ad  3.92  3.44   55ad    1   55ad    0   55ad    4   55ad    4   55ad  123  19.2   55ad 168.  18.3
5 Lincoln Con…  55ad     8  55ad  3     55ad  5.42     55ad  0     55ad  0     55ad  3     55ad  4   215  55ad  10.4  460   55ad  17.8
# … with extra  55ad rows

55ad

55ad Strong Scaler

55ad

55ad RobustScaler 55ad is a brand new 55ad performance launched in Spark 3.0 55ad ( 55ad SPARK-28399 55ad ). Due to a 55ad pull request 55ad by 55ad @zero323 55ad , an R interface for 55ad 55ad RobustScaler 55ad , specifically, the 55ad ft_robust_scaler() 55ad perform, is now a 55ad part of 55ad sparklyr 55ad .

55ad

55ad It’s usually noticed that many 55ad machine studying algorithms carry out 55ad higher on numeric inputs which 55ad might be standardized. Many people 55ad have realized in stats 101 55ad that given a random variable 55ad 55ad (X) 55ad , we will compute its 55ad imply 55ad (mu = E[X]) 55ad , commonplace deviation 55ad (sigma = sqrt{E[X^2] – (E[X])^2}) 55ad , after which receive a 55ad typical rating 55ad (z = frac{X – mu}{sigma}) 55ad which has imply of 55ad 0 and commonplace deviation of 55ad 1.

55ad

55ad Nevertheless, discover each 55ad (E[X]) 55ad and 55ad (E[X^2]) 55ad from above are portions 55ad that may be simply skewed 55ad by excessive outliers in 55ad (X) 55ad , inflicting distortions in 55ad (z) 55ad . A specific dangerous case 55ad of it will be if 55ad all non-outliers amongst 55ad (X) 55ad are very near 55ad (0) 55ad , therefore making 55ad (E[X]) 55ad near 55ad (0) 55ad , whereas excessive outliers are 55ad all far within the destructive 55ad course, therefore dragging down 55ad (E[X]) 55ad whereas skewing 55ad (E[X^2]) 55ad upwards.

55ad

55ad Another method of standardizing 55ad (X) 55ad primarily based on its 55ad median, 1st quartile, and third 55ad quartile values, all of that 55ad are strong in opposition to 55ad outliers, could be the next:

55ad

55ad (displaystyle z = frac{X – 55ad textual content{Median}(X)}{textual content{P75}(X) – textual 55ad content{P25}(X)})

55ad

55ad and that is exactly what 55ad 55ad RobustScaler 55ad provides.

55ad

55ad To see 55ad ft_robust_scaler() 55ad in motion and reveal 55ad its usefulness, we will undergo 55ad a contrived instance consisting of 55ad the next steps:

55ad

    55ad

  • 55ad Draw 500 random samples from 55ad the usual regular distribution
  • 55ad

55ad

 55ad   [1] -0.626453811   55ad 0.183643324 -0.835628612  1.595280802   55ad 0.329507772
  [6] -0.820468384   55ad 0.487429052  0.738324705  0.575781352  55ad -0.305388387
  ...

55ad

    55ad

  • 55ad Examine the minimal and maximal 55ad values among the many 55ad (500) 55ad random samples:
  • 55ad

55ad

 55ad   [1] -3.008049

55ad

 55ad   [1] 3.810277

55ad

    55ad

  • 55ad Now create 55ad (10) 55ad different values which might 55ad be excessive outliers in comparison 55ad with the 55ad (500) 55ad random samples above. Provided 55ad that we all know all 55ad 55ad (500) 55ad samples are inside the 55ad vary of 55ad ((-4, 4)) 55ad , we will select 55ad (-501, -502, ldots, -509, -510) 55ad as our 55ad (10) 55ad outliers:
  • 55ad

55ad

55ad

55ad

 55ad outliers 55ad   55ad <- 55ad   55ad - 55ad 500L 55ad   55ad - 55ad   55ad seq 55ad ( 55ad 10 55ad )

55ad

55ad

55ad

    55ad

  • 55ad Copy all 55ad (510) 55ad values right into a 55ad Spark dataframe named 55ad sdf
  • 55ad

55ad

55ad

55ad

 55ad library 55ad ( 55ad sparklyr 55ad ) 55ad 
 55ad 
 55ad sc 55ad   55ad <- 55ad   55ad spark_connect 55ad ( 55ad grasp  55ad = 55ad   55ad "native" 55ad , model  55ad = 55ad   55ad "3.0.0" 55ad ) 55ad 
 55ad sdf 55ad   55ad <- 55ad   55ad copy_to 55ad ( 55ad sc 55ad ,  55ad information.body 55ad ( 55ad worth  55ad = 55ad   55ad c 55ad ( 55ad sample_values 55ad ,  55ad outliers 55ad ) 55ad ) 55ad )

55ad

55ad

55ad

    55ad

  • 55ad We will then apply 55ad ft_robust_scaler() 55ad to acquire the standardized 55ad worth for every enter:
  • 55ad

55ad

55ad

55ad

 55ad scaled 55ad   55ad <- 55ad   55ad sdf 55ad   55ad %>% 55ad 
 55ad    55ad ft_vector_assembler 55ad ( 55ad "worth" 55ad ,  55ad "enter" 55ad ) 55ad   55ad %>% 55ad 
 55ad    55ad ft_robust_scaler 55ad ( 55ad "enter" 55ad ,  55ad "scaled" 55ad ) 55ad   55ad %>% 55ad 
 55ad    55ad dplyr 55ad :: 55ad pull 55ad ( 55ad scaled 55ad ) 55ad   55ad %>% 55ad 
 55ad    55ad unlist 55ad ( 55ad )

55ad

55ad

55ad

    55ad

  • 55ad Plotting the consequence exhibits the 55ad non-outlier information factors being scaled 55ad to values that also roughly 55ad type a bell-shaped distribution centered 55ad round 55ad (0) 55ad , as anticipated, so the 55ad scaling is strong in opposition 55ad to affect of the outliers:
  • 55ad

55ad

55ad

    55ad

  • 55ad Lastly, we will evaluate the 55ad distribution of the scaled values 55ad above with the distribution of 55ad z-scores of all enter values, 55ad and spot how scaling the 55ad enter with solely imply and 55ad commonplace deviation would have precipitated 55ad noticeable skewness – which the 55ad strong scaler has efficiently averted:
  • 55ad

55ad

55ad

55ad

 55ad all_values 55ad   55ad <- 55ad   55ad c 55ad ( 55ad sample_values 55ad ,  55ad outliers 55ad ) 55ad 
 55ad z_scores 55ad   55ad <- 55ad   55ad ( 55ad all_values 55ad   55ad - 55ad   55ad imply 55ad ( 55ad all_values 55ad ) 55ad ) 55ad   55ad / 55ad   55ad sd 55ad ( 55ad all_values 55ad ) 55ad 
 55ad ggplot 55ad ( 55ad information.body 55ad ( 55ad scaled  55ad = 55ad   55ad z_scores 55ad ) 55ad ,  55ad aes 55ad ( 55ad x  55ad = 55ad   55ad scaled 55ad ) 55ad ) 55ad   55ad + 55ad 
 55ad    55ad xlim 55ad ( 55ad - 55ad 0.05 55ad ,  55ad 0.2 55ad ) 55ad   55ad + 55ad 
 55ad    55ad geom_histogram 55ad ( 55ad binwidth  55ad = 55ad   55ad 0.005 55ad )

55ad

55ad

55ad

55ad

    55ad

  • 55ad From the two plots above, 55ad one can observe whereas each 55ad standardization processes produced some distributions 55ad that have been nonetheless bell-shaped, 55ad the one produced by 55ad ft_robust_scaler() 55ad is centered round 55ad (0) 55ad , appropriately indicating the typical 55ad amongst all non-outlier values, whereas 55ad the z-score distribution is clearly 55ad not centered round 55ad (0) 55ad as its middle has 55ad been noticeably shifted by the 55ad 55ad (10) 55ad outlier values.
  • 55ad

55ad

55ad RAPIDS

55ad

55ad Readers following Apache Spark releases 55ad carefully most likely have seen 55ad the latest addition of 55ad RAPIDS 55ad GPU acceleration assist in 55ad Spark 3.0. Catching up with 55ad this latest growth, an choice 55ad to allow RAPIDS in Spark 55ad connections was additionally created in 55ad 55ad sparklyr 55ad and shipped in 55ad sparklyr 55ad 1.4. On a bunch 55ad with RAPIDS-capable {hardware} (e.g., an 55ad Amazon EC2 occasion of kind 55ad ‘p3.2xlarge’), one can set up 55ad 55ad sparklyr 55ad 1.4 and observe RAPIDS 55ad {hardware} acceleration being mirrored in 55ad Spark SQL bodily question plans:

55ad

55ad

55ad

 55ad library 55ad ( 55ad sparklyr 55ad ) 55ad 
 55ad 
 55ad sc 55ad   55ad <- 55ad   55ad spark_connect 55ad ( 55ad grasp  55ad = 55ad   55ad "native" 55ad , model  55ad = 55ad   55ad "3.0.0" 55ad , packages  55ad = 55ad   55ad "rapids" 55ad ) 55ad 
 55ad dplyr 55ad :: 55ad db_explain 55ad ( 55ad sc 55ad ,  55ad "SELECT 4" 55ad )

55ad

55ad

55ad

 55ad == Bodily Plan ==
*(2) GpuColumnarToRow  55ad false
+- GpuProject [4 AS 4#45]
  55ad   +- GpuRowToColumnar TargetSize(2147483647)
  55ad       55ad +- *(1) Scan OneRowRelation[]

55ad

55ad All newly launched higher-order capabilities 55ad from Spark 3.0, akin to 55ad 55ad array_sort() 55ad with customized comparator, 55ad transform_keys() 55ad , 55ad transform_values() 55ad , and 55ad map_zip_with() 55ad , are supported by 55ad sparklyr 55ad 1.4.

55ad

55ad As well as, all higher-order 55ad capabilities can now be accessed 55ad instantly by 55ad dplyr 55ad somewhat than their 55ad hof_* 55ad counterparts in 55ad sparklyr 55ad . This implies, for instance, 55ad that we will run the 55ad next 55ad dplyr 55ad queries to calculate the 55ad sq. of all array parts 55ad in column 55ad x 55ad of 55ad sdf 55ad , after which type them 55ad in descending order:

55ad

55ad

55ad

 55ad library 55ad ( 55ad sparklyr 55ad ) 55ad 
 55ad 
 55ad sc 55ad   55ad <- 55ad   55ad spark_connect 55ad ( 55ad grasp  55ad = 55ad   55ad "native" 55ad , model  55ad = 55ad   55ad "3.0.0" 55ad ) 55ad 
 55ad sdf 55ad   55ad <- 55ad   55ad copy_to 55ad ( 55ad sc 55ad ,  55ad tibble 55ad :: 55ad tibble 55ad ( 55ad x  55ad = 55ad   55ad checklist 55ad ( 55ad c 55ad ( 55ad - 55ad 3 55ad ,  55ad - 55ad 2 55ad ,  55ad 1 55ad ,  55ad 5 55ad ) 55ad ,  55ad c 55ad ( 55ad 6 55ad ,  55ad - 55ad 7 55ad ,  55ad 5 55ad ,  55ad 8 55ad ) 55ad ) 55ad ) 55ad ) 55ad 
 55ad 
 55ad sq_desc 55ad   55ad <- 55ad   55ad sdf 55ad   55ad %>% 55ad 
 55ad    55ad dplyr 55ad :: 55ad mutate 55ad ( 55ad x  55ad = 55ad   55ad remodel 55ad ( 55ad x 55ad ,  55ad ~ 55ad   55ad .x 55ad   55ad * 55ad   55ad .x 55ad ) 55ad ) 55ad   55ad %>% 55ad 
 55ad    55ad dplyr 55ad :: 55ad mutate 55ad ( 55ad x  55ad = 55ad   55ad array_sort 55ad ( 55ad x 55ad ,  55ad ~ 55ad   55ad as.integer 55ad ( 55ad signal 55ad ( 55ad .y 55ad   55ad - 55ad   55ad .x 55ad ) 55ad ) 55ad ) 55ad ) 55ad   55ad %>% 55ad 
 55ad    55ad dplyr 55ad :: 55ad pull 55ad ( 55ad x 55ad ) 55ad 
 55ad 
 55ad print 55ad ( 55ad sq_desc 55ad )

55ad

55ad

55ad

 55ad [[1]]
[1] 25  9   55ad 4  1

[[2]]
[1] 64 49  55ad 36 25

55ad

55ad Acknowledgement

55ad

55ad In chronological order, we wish 55ad to thank the next people 55ad for his or her contributions 55ad to 55ad sparklyr 55ad 1.4:

55ad

55ad We additionally admire bug experiences, 55ad characteristic requests, and priceless different 55ad suggestions about 55ad sparklyr 55ad from our superior open-source 55ad group (e.g., the weighted sampling 55ad characteristic in 55ad sparklyr 55ad 1.4 was largely motivated 55ad by this 55ad Github problem 55ad filed by 55ad @ajing 55ad , and a few 55ad dplyr 55ad -related bug fixes on this 55ad launch have been initiated in 55ad 55ad #2648 55ad and accomplished with this 55ad 55ad pull request 55ad by 55ad @wkdavis 55ad ).

55ad

55ad Final however not least, the 55ad writer of this weblog publish 55ad is extraordinarily grateful for incredible 55ad editorial options from 55ad @javierluraschi 55ad , 55ad @batpigandme 55ad , and 55ad @skeydan 55ad .

55ad

55ad If you happen to want 55ad to study extra about 55ad sparklyr 55ad , we suggest testing 55ad sparklyr.ai 55ad , 55ad spark.rstudio.com 55ad , and in addition a 55ad number of the earlier launch 55ad posts akin to 55ad sparklyr 1.3 55ad and 55ad sparklyr 1.2 55ad .

55ad

55ad Thanks for studying!

55ad

55ad

55ad
55ad

55ad

LEAVE A REPLY

Please enter your comment!
Please enter your name here