おぴよの気まぐれ日記

おぴよの気まぐれ日記

岡山やプログラミング、ファッションのこと、子育てや人生、生き方についての備忘録。

【Ruby on Rails】スクレイピングで簡単にmeta情報を取得するgem metainspector

こんばんわ。エンジニアに夢を描くopiyoです。

最近はずっとスクレイピングについて触れることが多いのですが、こんな経験ないですか?

ブログ書く時とかurl貼り付けるだけで引用文が作成されるけど、あれどうやってんだ?

まさにこういう奴↓↓

opiyotan.hatenablog.com

多分ですが、これもスクレイピングで情報を引っ張ってきているはずなのです。

この情報を簡単に取得できちゃうgemを見っけたので今日は、それについて。

使う準備

こちらは、おなじみGemfileに`metainspector'を記述します。

gem 'metainspector'

こちらも、おなじみbundle installします。

$ bundle install

使い方

はてなブログのトップページにある記事から情報を取得したいと思います!

広島県 VS EM研究機構 - warbler’s diary

※なんだかお堅いサンプルになってしまいしたが...利用させて頂きます。ありがとうございます。

概要=descriptionを取得する

やり方は非常に簡単で取得したデータに対してdescriptionメソッドを呼ぶだけ。

meta = MetaInspector.new("http://warbler.hatenablog.com/entry/2018/01/24/011739")
puts meta.description
.
.
.
広島県からEM菌の水質浄化効果を否定する内容の報告書が出された事に対して、EM研究機構が抗議をしていました。 ・EM菌の培養液は有機物と栄養塩類が高濃度に含まれることから「河川等の汚染源になり得る」という実験結果を報告した福島県に対しても、EM研究機構を含むEM推進側が抗議をしていた事については、既に本ブログで報告して…

画像を取得する

こちらも非常に簡単で取得したデータからimages.bestするだけ。

meta = MetaInspector.new(url)
puts meta.images.best
.
.
.
=> "https://cdn-ak.f.st-hatena.com/images/fotolife/w/warbler/20180124/20180124001356.png"

その他

その他にも色々な情報を簡単に取得できるので、ざらっと書いてみます。

$ rails c
> meta.url
=> "http://warbler.hatenablog.com/entry/2018/01/24/011739"

> meta.host
=> "warbler.hatenablog.com"

> meta.head_links
=> [{:rel=>"canonical", :href=>"http://warbler.hatenablog.com/entry/2018/01/24/011739"}, {:rel=>"shortcut icon", :href=>"https://cdn.image.st-hatena.com/image/favicon/0c16e66ee260e69eb13510af1e9878996b545656/version=1/https:%2F%2Fcdn.user.blog.st-hatena.com%2Fcustom_blog_icon%2F36385751%2F1514183975459583"}, {:rel=>"icon", :sizes=>"192x192", :href=>"https://cdn.image.st-hatena.com/image/square/7fdc8aba078df5faff51cf50422005a5085822ff/backend=imagemagick;height=192;version=1;width=192/https:%2F%2Fcdn.user.blog.st-hatena.com%2Fcustom_blog_icon%2F36385751%2F1514183975459583"}, {:rel=>"alternate", :type=>"application/atom+xml", :title=>"Atom", :href=>"http://warbler.hatenablog.com/feed"}, {:rel=>"alternate", :type=>"application/rss+xml", :title=>"RSS2.0", :href=>"http://warbler.hatenablog.com/rss"}, {:rel=>"alternate", :type=>"application/json+oembed", :href=>"http://hatenablog.com/oembed?url=http://warbler.hatenablog.com/entry/2018/01/24/011739&format=json", :title=>"oEmbed Profile of 広島県 VS EM研究機構"}, {:rel=>"alternate", :type=>"text/xml+oembed", :href=>"http://hatenablog.com/oembed?url=http://warbler.hatenablog.com/entry/2018/01/24/011739&format=xml", :title=>"oEmbed Profile of 広島県 VS EM研究機構"}, {:rel=>"author", :href=>"http://www.hatena.ne.jp/warbler/"}, {:rel=>"stylesheet", :type=>"text/css", :href=>"https://cdn.blog.st-hatena.com/css/blog.css?version=16c4f762d17230c5653ead590e78f5c1c08b8f84&env=production"}, {:rel=>"stylesheet", :type=>"text/css", :href=>"http://blog.hatena.ne.jp/-/blog_style/6653586347156021977/567bb7d3aa205029bfc477b21f9e4dc351ef5eee"}]

> meta.feed
=> "http://warbler.hatenablog.com/rss"

> meta.title
=> "広島県 VS EM研究機構 - warbler’s diary"

> meta.links.raw
=> ["#", "http://warbler.hatenablog.com/", "http://warbler.hatenablog.com/archive/2018/01/24", "http://warbler.hatenablog.com/entry/2018/01/24/011739", "https://www.emro.co.jp/information/04_HH/", "http://www.pref.aomori.lg.jp/kenminno-koe/24K23.html", "http://b.hatena.ne.jp/entry/http://warbler.hatenablog.com/entry/2018/01/24/011739", "https://twitter.com/share", "http://warbler.hatenablog.com/entry/2018/01/23/235522", "http://warbler.hatenablog.com/archive/2018/01/23", "http://warbler.hatenablog.com/entry/20150227/1425057866", "http://warbler.hatenablog.com/archive/2015/02/27", "http://warbler.hatenablog.com/archive/2013/09/03", "http://warbler.hatenablog.com/entry/20130903/1378217975", "http://warbler.hatenablog.com/entry/20130712/1373632961", "http://warbler.hatenablog.com/archive/2013/07/12", "http://warbler.hatenablog.com/entry/20130428/1367131822", "http://warbler.hatenablog.com/archive/2013/04/28", "http://warbler.hatenablog.com/about", "http://blog.hatena.ne.jp/guide/pro", "http://warbler.hatenablog.com/archive", "http://warbler.hatenablog.com/entry/2017/12/02/235126", "http://warbler.hatenablog.com/entry/2017/11/06/113739", "http://warbler.hatenablog.com/entry/2017/09/06/223001", "http://warbler.hatenablog.com/archive/category/EM%E9%96%A2%E4%BF%82", "http://warbler.hatenablog.com/archive/category/%E7%92%B0%E5%A2%83%E5%95%8F%E9%A1%8C", "http://warbler.hatenablog.com/archive/category/%E7%A7%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E7%92%B0%E5%A2%83", "http://warbler.hatenablog.com/archive/category/%E7%A0%94%E7%A9%B6%E4%B8%8D%E6%AD%A3", "http://warbler.hatenablog.com/archive/category/%E8%AA%A4%E3%81%A3%E3%81%9F%E7%B5%B1%E8%A8%88", "http://warbler.hatenablog.com/archive/category/%E3%83%8B%E3%82%BB%E7%A7%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E6%94%BE%E5%B0%84%E7%B7%9A", "http://warbler.hatenablog.com/archive/category/%E3%83%9E%E3%82%B9%E3%82%B3%E3%83%9F", "http://warbler.hatenablog.com/archive/category/%E5%B2%A1%E5%B1%B1%E5%A4%A7%E4%BA%8B%E4%BB%B6", "http://warbler.hatenablog.com/archive/category/%E6%9B%B8%E8%A9%95", "http://warbler.hatenablog.com/archive/category/%E9%A3%9F%E5%93%81", "http://warbler.hatenablog.com/archive/category/%E5%81%A5%E5%BA%B7", "http://warbler.hatenablog.com/archive/category/%E3%83%9B%E3%83%A1%E3%82%AA%E3%83%91%E3%82%B7%E3%83%BC", "http://warbler.hatenablog.com/archive/category/%E6%8D%8F%E9%80%A0%E8%AB%96%E6%96%87", "http://warbler.hatenablog.com/archive/category/%E7%99%BA%E9%81%94%E9%9A%9C%E5%AE%B3", "http://warbler.hatenablog.com/archive/category/%E5%8C%BB%E7%99%82", "http://warbler.hatenablog.com/archive/category/%E9%9B%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E7%B5%B1%E8%A8%88", "http://warbler.hatenablog.com/archive/category/%E8%A6%AA%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E6%95%99%E8%82%B2", "http://blog.hatena.ne.jp/register?via=200227", "http://hatenablog.com/guide", "http://hatenablog.com/", "http://www.hatena.ne.jp/faq/report/blog?target_label=warbler&target_url=http%3A%2F%2Fblog.hatena.ne.jp%2Fgo%3Fblog%3Dhttp%253A%252F%252Fwarbler.hatenablog.com%252Fentry%252F2018%252F01%252F24%252F011739&location=http%3A%2F%2Fblog.hatena.ne.jp%2Fgo%3Fblog%3Dhttp%253A%252F%252Fwarbler.hatenablog.com%252Fentry%252F2018%252F01%252F24%252F011739"]

> meta.links.http
=> ["http://warbler.hatenablog.com/entry/2018/01/24/011739", "http://warbler.hatenablog.com/", "http://warbler.hatenablog.com/archive/2018/01/24", "https://www.emro.co.jp/information/04_HH/", "http://www.pref.aomori.lg.jp/kenminno-koe/24K23.html", "http://b.hatena.ne.jp/entry/http://warbler.hatenablog.com/entry/2018/01/24/011739", "https://twitter.com/share", "http://warbler.hatenablog.com/entry/2018/01/23/235522", "http://warbler.hatenablog.com/archive/2018/01/23", "http://warbler.hatenablog.com/entry/20150227/1425057866", "http://warbler.hatenablog.com/archive/2015/02/27", "http://warbler.hatenablog.com/archive/2013/09/03", "http://warbler.hatenablog.com/entry/20130903/1378217975", "http://warbler.hatenablog.com/entry/20130712/1373632961", "http://warbler.hatenablog.com/archive/2013/07/12", "http://warbler.hatenablog.com/entry/20130428/1367131822", "http://warbler.hatenablog.com/archive/2013/04/28", "http://warbler.hatenablog.com/about", "http://blog.hatena.ne.jp/guide/pro", "http://warbler.hatenablog.com/archive", "http://warbler.hatenablog.com/entry/2017/12/02/235126", "http://warbler.hatenablog.com/entry/2017/11/06/113739", "http://warbler.hatenablog.com/entry/2017/09/06/223001", "http://warbler.hatenablog.com/archive/category/EM%E9%96%A2%E4%BF%82", "http://warbler.hatenablog.com/archive/category/%E7%92%B0%E5%A2%83%E5%95%8F%E9%A1%8C", "http://warbler.hatenablog.com/archive/category/%E7%A7%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E7%92%B0%E5%A2%83", "http://warbler.hatenablog.com/archive/category/%E7%A0%94%E7%A9%B6%E4%B8%8D%E6%AD%A3", "http://warbler.hatenablog.com/archive/category/%E8%AA%A4%E3%81%A3%E3%81%9F%E7%B5%B1%E8%A8%88", "http://warbler.hatenablog.com/archive/category/%E3%83%8B%E3%82%BB%E7%A7%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E6%94%BE%E5%B0%84%E7%B7%9A", "http://warbler.hatenablog.com/archive/category/%E3%83%9E%E3%82%B9%E3%82%B3%E3%83%9F", "http://warbler.hatenablog.com/archive/category/%E5%B2%A1%E5%B1%B1%E5%A4%A7%E4%BA%8B%E4%BB%B6", "http://warbler.hatenablog.com/archive/category/%E6%9B%B8%E8%A9%95", "http://warbler.hatenablog.com/archive/category/%E9%A3%9F%E5%93%81", "http://warbler.hatenablog.com/archive/category/%E5%81%A5%E5%BA%B7", "http://warbler.hatenablog.com/archive/category/%E3%83%9B%E3%83%A1%E3%82%AA%E3%83%91%E3%82%B7%E3%83%BC", "http://warbler.hatenablog.com/archive/category/%E6%8D%8F%E9%80%A0%E8%AB%96%E6%96%87", "http://warbler.hatenablog.com/archive/category/%E7%99%BA%E9%81%94%E9%9A%9C%E5%AE%B3", "http://warbler.hatenablog.com/archive/category/%E5%8C%BB%E7%99%82", "http://warbler.hatenablog.com/archive/category/%E9%9B%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E7%B5%B1%E8%A8%88", "http://warbler.hatenablog.com/archive/category/%E8%A6%AA%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E6%95%99%E8%82%B2", "http://blog.hatena.ne.jp/register?via=200227", "http://hatenablog.com/guide", "http://hatenablog.com/", "http://www.hatena.ne.jp/faq/report/blog?target_label=warbler&target_url=http://blog.hatena.ne.jp/go?blog=http%253A%252F%252Fwarbler.hatenablog.com%252Fentry%252F2018%252F01%252F24%252F011739&location=http://blog.hatena.ne.jp/go?blog=http%253A%252F%252Fwarbler.hatenablog.com%252Fentry%252F2018%252F01%252F24%252F011739"]

> meta.links.internal
=> ["http://warbler.hatenablog.com/entry/2018/01/24/011739", "http://warbler.hatenablog.com/", "http://warbler.hatenablog.com/archive/2018/01/24", "http://warbler.hatenablog.com/entry/2018/01/23/235522", "http://warbler.hatenablog.com/archive/2018/01/23", "http://warbler.hatenablog.com/entry/20150227/1425057866", "http://warbler.hatenablog.com/archive/2015/02/27", "http://warbler.hatenablog.com/archive/2013/09/03", "http://warbler.hatenablog.com/entry/20130903/1378217975", "http://warbler.hatenablog.com/entry/20130712/1373632961", "http://warbler.hatenablog.com/archive/2013/07/12", "http://warbler.hatenablog.com/entry/20130428/1367131822", "http://warbler.hatenablog.com/archive/2013/04/28", "http://warbler.hatenablog.com/about", "http://warbler.hatenablog.com/archive", "http://warbler.hatenablog.com/entry/2017/12/02/235126", "http://warbler.hatenablog.com/entry/2017/11/06/113739", "http://warbler.hatenablog.com/entry/2017/09/06/223001", "http://warbler.hatenablog.com/archive/category/EM%E9%96%A2%E4%BF%82", "http://warbler.hatenablog.com/archive/category/%E7%92%B0%E5%A2%83%E5%95%8F%E9%A1%8C", "http://warbler.hatenablog.com/archive/category/%E7%A7%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E7%92%B0%E5%A2%83", "http://warbler.hatenablog.com/archive/category/%E7%A0%94%E7%A9%B6%E4%B8%8D%E6%AD%A3", "http://warbler.hatenablog.com/archive/category/%E8%AA%A4%E3%81%A3%E3%81%9F%E7%B5%B1%E8%A8%88", "http://warbler.hatenablog.com/archive/category/%E3%83%8B%E3%82%BB%E7%A7%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E6%94%BE%E5%B0%84%E7%B7%9A", "http://warbler.hatenablog.com/archive/category/%E3%83%9E%E3%82%B9%E3%82%B3%E3%83%9F", "http://warbler.hatenablog.com/archive/category/%E5%B2%A1%E5%B1%B1%E5%A4%A7%E4%BA%8B%E4%BB%B6", "http://warbler.hatenablog.com/archive/category/%E6%9B%B8%E8%A9%95", "http://warbler.hatenablog.com/archive/category/%E9%A3%9F%E5%93%81", "http://warbler.hatenablog.com/archive/category/%E5%81%A5%E5%BA%B7", "http://warbler.hatenablog.com/archive/category/%E3%83%9B%E3%83%A1%E3%82%AA%E3%83%91%E3%82%B7%E3%83%BC", "http://warbler.hatenablog.com/archive/category/%E6%8D%8F%E9%80%A0%E8%AB%96%E6%96%87", "http://warbler.hatenablog.com/archive/category/%E7%99%BA%E9%81%94%E9%9A%9C%E5%AE%B3", "http://warbler.hatenablog.com/archive/category/%E5%8C%BB%E7%99%82", "http://warbler.hatenablog.com/archive/category/%E9%9B%91%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E7%B5%B1%E8%A8%88", "http://warbler.hatenablog.com/archive/category/%E8%A6%AA%E5%AD%A6", "http://warbler.hatenablog.com/archive/category/%E6%95%99%E8%82%B2"] 

使い方はgithubにサンプルが乗ってますので、是非ご覧ください。

github.com

まとめ

何か困った時は先ずは調べてみることで、簡単に実装できるgemがいっぱいあるのでRubyは楽しいですね。

今日は、そんな感じです。