389b
389b 389b
389b 389b 389b
389b 389b 389b
389b 389b 389b
389b 389b 389b
389b 389b
389b Knowledge scientists working in Python 389b or R sometimes purchase information 389b by means of REST APIs. 389b Each environments present libraries that 389b provide help to make HTTP 389b calls to REST endpoints, then 389b remodel JSON responses into dataframes. 389b However that’s by no means 389b so simple as we’d like. 389b While you’re studying a number 389b of information from a REST 389b API, you’ll want to do 389b it a web page at 389b a time, however pagination works 389b in another way from one 389b API to the subsequent. So 389b does unpacking the ensuing JSON 389b constructions. HTTP and JSON are 389b low-level requirements, and REST is 389b a loosely-defined framework, however nothing 389b ensures absolute simplicity, by no 389b means thoughts consistency throughout APIs.
389b
389b What if there have been 389b a means of studying from 389b APIs that abstracted all of 389b the low-level grunt work and 389b labored the identical means in 389b all places? Excellent news! That’s 389b precisely what 389b Steampipe 389b does. It’s a instrument that 389b interprets REST API calls instantly 389b into SQL tables. Listed here 389b are three examples of questions 389b you can ask and reply 389b utilizing Steampipe.
389b
389b
389b
389b Study sooner. Dig deeper. 389b See farther.
389b
389b
389b
389b
389b 1. Twitter: What are latest 389b tweets that point out PySpark?
389b
389b Right here’s a SQL question 389b to ask that query:
389b
389b choose
id,
textual 389b content
from
twitter_search_recent
the place
389b question = 'pyspark'
order by
389b created_at desc
restrict 5;
389b
389b Right here’s the reply:
389b
389b +---------------------+------------------------------------------------------------------------------------------------>
| id 389b 389b 389b | 389b textual content 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
+---------------------+------------------------------------------------------------------------------------------------>
| 1526351943249154050 | 389b @dump Tenho trabalhando bastante com 389b Spark, mas especificamente o PySpark. 389b Vale a pena usar um 389b >
| 1526336147856687105 | RT @MitchellvRijkom: 389b PySpark Tip ⚡ 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b 389b | When to make use 389b of what StorageLevel for Cache 389b / Persist? 389b 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | 389b StorageLevel decides how and the 389b place information ought to be 389b s… 389b 389b 389b 389b 389b 389b 389b 389b >
| 1526322757880848385 389b | Remedy challenges and exceed 389b expectations with a profession as 389b a AWS Pyspark Engineer. https://t.co/>
| 389b 1526318637485010944 | RT @JosMiguelMoya1: #pyspark 389b #spark #BigData curso completo de 389b Python y Spark con PySpark 389b 389b >
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | https://t.co/qf0gIvNmyx 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
| 1526318107228524545 | RT 389b @money_personal: PySpark & AWS: Grasp 389b Huge Knowledge With PySpark and 389b AWS 389b 389b 389b 389b >
| 389b 389b 389b 389b | #ApacheSpark #AWSDatabases #BigData 389b #PySpark #100DaysofCode 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | -> 389b http… 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b >
+---------------------+------------------------------------------------------------------------------------------------>
389b
389b The desk that’s being queried 389b right here, 389b twitter_search_recent 389b , receives the output from 389b Twitter’s 389b /2/tweets/search/latest 389b endpoint and formulates it as 389b a desk with 389b these columns 389b . You don’t should make 389b an HTTP name to that 389b API endpoint or unpack the 389b outcomes, you simply write a 389b SQL question that refers back 389b to the documented columns. A 389b kind of columns, 389b question
389b , is particular: it encapsulates 389b Twitter’s 389b question syntax 389b . Right here, we’re simply 389b in search of tweets that 389b match 389b PySpark 389b however we might as simply 389b refine the question by pinning 389b it to particular customers, URLs, 389b varieties ( 389b is:retweet
389b , 389b is:reply
389b ), properties ( 389b has:mentions
389b , 389b has_media
389b ), and so forth. That 389b question syntax is identical regardless 389b of the way you’re accessing 389b the API: from Python, from 389b R, or from Steampipe. It’s 389b loads to consider, and all 389b you need to actually need 389b to know when crafting queries 389b to mine Twitter information.
389b
389b 2. GitHub: What are repositories 389b that point out PySpark?
389b
389b Right here’s a SQL question 389b to ask that query:
389b
389b choose
identify,
389b owner_login,
stargazers_count 389b
from
github_search_repository
the 389b place
question = 389b 'pyspark'
order by stargazers_count desc 389b
restrict 10;
389b
389b Right here’s the reply:
389b
389b +----------------------+-------------------+------------------+
| identify 389b 389b 389b | owner_login 389b 389b | stargazers_count |
+----------------------+-------------------+------------------+
| SynapseML 389b 389b 389b | microsoft 389b 389b | 3297 389b 389b 389b |
| spark-nlp 389b 389b | JohnSnowLabs 389b 389b | 2725 389b 389b |
| 389b incubator-linkis 389b | apache 389b 389b | 2524 389b 389b 389b |
| ibis 389b 389b 389b 389b | ibis-project 389b | 1805 389b 389b 389b |
| spark-py-notebooks 389b | jadianes 389b 389b | 1455 389b 389b 389b |
| petastorm 389b 389b | uber 389b 389b 389b | 1423 389b 389b 389b |
| awesome-spark 389b 389b | awesome-spark 389b | 1314 389b 389b 389b |
| sparkit-learn 389b 389b | lensacom 389b 389b | 1124 389b 389b 389b |
| sparkmagic 389b 389b | jupyter-incubator | 389b 1121 389b 389b |
| data-algorithms-book 389b | mahmoudparsian 389b | 1001 389b 389b |
+----------------------+-------------------+------------------+
389b
389b This appears to be like 389b similar to the primary instance! 389b On this case, the desk 389b that’s being queried, 389b github_search_repository 389b , receives the output from 389b GitHub’s 389b /search/repositories 389b endpoint and formulates it as 389b a desk with 389b these columns 389b .
389b
389b In each circumstances the Steampipe 389b documentation not solely reveals you 389b the schemas that govern the 389b mapped tables, it additionally offers 389b examples ( 389b Twitter 389b , 389b GitHub 389b ) of SQL queries that 389b use the tables in numerous 389b methods.
389b
389b Be aware that these are 389b simply two of many obtainable 389b tables. The Twitter API is 389b mapped to 389b 7 tables 389b , and the GitHub API 389b is mapped to 389b 41 tables 389b .
389b
389b 3. Twitter + GitHub: What 389b have homeowners of PySpark-related repositories 389b tweeted these days?
389b
389b To reply this query we 389b have to seek the advice 389b of two totally different APIs, 389b then be a part of 389b their outcomes. That’s even tougher 389b to do, in a constant 389b means, if you’re reasoning over 389b REST payloads in Python or 389b R. However that is the 389b sort of factor SQL was 389b born to do. Right here’s 389b one option to ask the 389b query in SQL.
389b
389b -- discover pyspark repos
with github_repos 389b as (
choose
389b identify,
389b owner_login,
389b stargazers_count
389b from
389b github_search_repository
the 389b place
389b question = 'pyspark' and identify 389b ~ 'pyspark'
order by 389b stargazers_count desc
restrict 389b 50
),
-- discover twitter handles of 389b repo homeowners
github_users as (
389b choose
u.login,
389b u.twitter_username
389b from
github_user 389b u
be a part 389b of
github_repos 389b r
on
389b r.owner_login = u.login
389b the place
389b u.twitter_username isn't null
),
-- discover corresponding 389b twitter customers
choose
389b id
from
389b twitter_user t
389b be a part of
389b github_users g
389b on
389b t.username = g.twitter_username
)
-- discover tweets 389b from these customers
choose
t.author->>'username' 389b as twitter_user,
'https://twitter.com/' || 389b (t.author->>'username') || '/standing/' || t.id 389b as url,
t.textual content
from
389b twitter_user_tweet t
be a part 389b of
twitter_userids u
on
389b t.user_id = u.id
the place
389b t.created_at > now()::date - interval 389b '1 week'
order by
t.writer
restrict 389b 5
389b
389b Right here is the reply:
389b
389b +----------------+---------------------------------------------------------------+------------------------------------->
| twitter_user | 389b url 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b textual content 389b 389b 389b 389b 389b 389b >
+----------------+---------------------------------------------------------------+------------------------------------->
| idealoTech 389b | 389b https://twitter.com/idealoTech/standing/1524688985649516544 389b | Can you discover inventive 389b soluti>
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | Be part of our 389b @codility Order #API Challe>
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | #idealolife 389b #codility #php 389b 389b >
| idealoTech 389b | https://twitter.com/idealoTech/standing/1526127469706854403 389b | Our 389b #ProductDiscovery staff at idealo>
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | Assume 389b you possibly can clear up 389b it? 😎 389b 389b >
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | ➡️ https://t.co/ELfUfp94vB https://t>
| 389b ioannides_alex | https://twitter.com/ioannides_alex/standing/1525049398811574272 | RT 389b @scikit_learn: scikit-learn 1.1 i>
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | What's new? 389b You possibly can examine the 389b releas>
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | pip set up -U… 389b 389b 389b 389b 389b >
| andfanilo 389b | https://twitter.com/andfanilo/standing/1524999923665711104 389b | 389b @edelynn_belle Thanks! Typically it >
| 389b andfanilo 389b | https://twitter.com/andfanilo/standing/1523676489081712640 389b | @juliafmorgado 389b Good luck on the reco>
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b My recommendation: energy by way 389b of it + a lifeless>
| 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b I hated my first few 389b brief movies bu>
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b >
| 389b 389b 389b | 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b 389b | Wanting ahead 389b to the video 🙂
389b
389b When APIs frictionlessly turn out 389b to be tables, you possibly 389b can dedicate your full consideration 389b to reasoning over the abstractions 389b represented by these APIs. Larry 389b Wall, the creator of Perl, 389b famously mentioned: “Straightforward issues ought 389b to be straightforward, exhausting issues 389b ought to be potential.” The 389b primary two examples are issues 389b that ought to be, and 389b are, straightforward: every is simply 389b 10 traces of straightforward, straight-ahead 389b SQL that requires no wizardry 389b in any respect.
389b
389b The third instance is a 389b tougher factor. It will be 389b exhausting in any programming language. 389b However SQL makes it potential 389b in a number of good 389b methods. The answer is manufactured 389b from concise stanzas (CTEs, Frequent 389b Desk Expressions) that type a 389b pipeline. Every part of the 389b pipeline handles one clearly-defined piece 389b of the issue. You possibly 389b can validate the output of 389b every part earlier than continuing 389b to the subsequent. And you 389b are able to do all 389b this with probably the most 389b mature and widely-used grammar for 389b choice, filtering, and recombination of 389b knowledge.
389b
389b Do I’ve to make use 389b of SQL?
389b
389b No! In case you like 389b the thought of mapping APIs 389b to tables, however you’d relatively 389b motive over these tables in 389b Python or R dataframes, then 389b Steampipe can oblige. Underneath the 389b covers it’s Postgres, enhanced with 389b overseas information wrappers 389b that deal with the API-to-table 389b transformation. Something that may connect 389b with Postgres can connect with 389b Steampipe, together with SQL drivers 389b like Python’s 389b psycopg2
389b and R’s 389b RPostgres
389b in addition to business-intelligence instruments 389b like Metabase, Tableau, and PowerBI. 389b So you should use Steampipe 389b to frictionlessly devour APIs into 389b dataframes, then motive over the 389b information in Python or R.
389b
389b However in case you haven’t 389b used SQL on this means 389b earlier than, it’s price a 389b glance. Contemplate this comparability of 389b SQL to Pandas from 389b Find out how to rewrite 389b your SQL queries in Pandas 389b .
389b
389b SQL | 389b Pandas |
---|---|
389b choose * from airports | 389b airports |
389b choose * from airports restrict 389b 3 | 389b airports.head(3) |
389b choose id from airports the 389b place ident = ‘KLAX’ | 389b airports[airports.ident == ‘KLAX’].id |
389b choose distinct kind from airport | 389b airports.kind.distinctive() |
389b choose * from airports the 389b place iso_region = ‘US-CA’ and 389b sort = ‘seaplane_base’ | 389b airports[(airports.iso_region == ‘US-CA’) & (airports.type 389b == ‘seaplane_base’)] |
389b choose ident, identify, municipality from 389b airports the place iso_region = 389b ‘US-CA’ and sort = ‘large_airport’ | 389b airports[(airports.iso_region == ‘US-CA’) & (airports.type 389b == ‘large_airport’)][[‘ident’, ‘name’, ‘municipality’]] |
389b
389b We are able to argue 389b the deserves of 1 fashion 389b versus the opposite, however there’s 389b no query that SQL is 389b probably the most common and 389b widely-implemented option to specific these 389b operations on information. So no, 389b you don’t have to make 389b use of SQL to its 389b fullest potential with the intention 389b to profit from Steampipe. However 389b you may discover that you 389b simply need to.
389b
389b 389b 389b 389b 389b
389b