How to Scrape and Import Live Web Data Into Notion Using Google Sheets

How to Scrape and Import Live Web Data Into Notion Using Google Sheets

From Netflix to Prime Video to Disney+, we’ve never had more TV shows available to watch. At the same time, picking the best next candidate has become increasingly difficult. We want to get the most out of the time we spend in front of the TV, but we don’t always want to let Netflix choose what we should watch next.

If you’re in a similar situation, you’ve come to the right place. In this post, we’ll show you how to use Notion and Google Sheets to import information from IMDb and create a list of shows to watch for your next binge session.

Choosing a Notion Template

Notion TV Tracker template.

We searched for the best Notion database template to track our shows and decided this free TV Show Tracker template is one of the best.

We will be making a couple of changes to the database to add additional information. We want to show the rating from IMDb and the current popularity of each TV show, so we need to create two additional database properties with the Number type:

TV Tracker database new properties.

Synchronizing the Database with Google Sheets

We plan to get the IMDb rating and popularity from IMDb and update our Notion database with that information.

Doing this manually would be tedious. To save time, we’ll use a feature from Sync2Sheets called synced columns.

Synced columns allow us to send data from Google Sheets to Notion. Think of it as something similar to a formula property in Notion, where the data comes from Google Sheets.

If you don’t have Sync2Sheets installed yet, you can download it from the Google Workspace Marketplace and watch this 3-minute YouTube video to learn how to use it.


Using IMPORTXML to Get the Information We Want from IMDb

If you haven’t heard about the =IMPORTXML formula before, it allows you to import data from various structured data types including XML, HTML, and CSV. In our case, we’ll use it to extract the rating and popularity from IMDb.

For example, here you can see where to find the IMDb Rating and Popularity data for Prison Break:

Prision Break on IMDb.

IMPORTXML accepts two parameters. The first is a URL pointing to the HTML page you want to parse, and the second is the XPath to query the HTML document and get the data.

It sounds complicated. But with the help of Google Chrome, we’ll be able to get the path easily. The only thing we’ll have to do is copy and paste.

To start, right-click on the element where the data is (in our case the IMDb rating) and then select Inspect. This will open the Chrome Developer Tools and allow us to see the HTML document.

In the screenshot below, you can see the element containing the 8.3 value we want to extract. To copy the value, right-click the HTML element and choose Copy -> Copy XPath.

How to copy the xpath.

The result of this action is the XPath that we’ll use with IMPORTXML to get the show information:

//*[@id="__next"]/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[1]/a/div/div/div[2]/div[1]/span[1]

To make this XPath usable in Google Sheets, we’ll change the double quotes to single quotes. The result should look like this:

//*[@id=’__next’]/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[1]/a/div/div/div[2]/div[1]/span[1]

We’ll repeat the process with the Popularity, Episodes, and Seasons. If you are having trouble during this step, here are the four XPaths that we will be using:

IMDb rating

//*[@id=’__next’]/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[1]/a/div/div/div[2]/div[1]/span[1]

Popularity

//*[@id=’__next’]/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[3]/a/div/div/div[2]/div[1]

Episodes

//*[@id=’__next’]/main/div/section[1]/div/section/div/div[1]/section[1]/div[1]/a/h3/span

Seasons

//*[@id=‘browse-episodes-season’]


Getting the Data Into Google Sheets

We now have our database in sync with Google Sheets, and we know how to get the data we want. The only manual task we will need to perform for each TV Show is getting the IMDb link to extract the information.

IMDb search functionality is great and super fast, so it took me around 2 minutes to get the links for all the US TV shows in our Notion database.

To improve the clarity of our formulas, let’s create a sheet named XPaths and copy and paste each XPath into it so we can reference them in IMPORTXML easily. We can also use named ranges to give the cell a name, which will make it easier to write formulas that work with the data.

Named ranges xpaths on Google Sheets.

Now comes the easy part: adding the formulas to our columns and letting Google Sheets do the work for us. In this example, G is the column where we have the URLs to IMDb:

  • IMDb rating: =IMPORTXML(G2, Rating)
  • Popularity: =IMPORTXML(G2, Popularity)
  • Episodes: =IMPORTXML(G2, Episodes)
  • Seasons: =INDEX(IMPORTXML(G2, Seasons), 1, 2)

You can click to open the GIF full screen.

image

Sending the Data to Notion

Now comes the last and most important step: sending all this information to Notion.

The process is simple. First, we’ll create a synced column for the IMDb rating. Once we get email confirmation that the sync has finished, we’ll remove the synced column. We’ll then repeat this process for the remaining three columns.

If you want to keep one of the columns in sync and updated automatically (such as Popularity, which is probably the one that changes more frequently), create that column last and don’t remove it. Sync2Sheets will update the values in Notion automatically each hour.

You can check out the process of creating the synced columns and how Notion gets updated in the video below:

And with that, we are done 🙌 ! We have our Notion TV Shows Tracker database that shows the number of seasons, episodes, popularity, and IMDb rating.

Conclusion

We learned how to scrape data from a website using the =IMPORTXML formula from Google Sheets and how to get the XPath of what we want using Chrome’s Inspect feature. We also used named ranges to give some clarity to our formulas and then learned how to send that data back into Notion for free using Sync2Sheets.

This TV show tracker is just one of the many ways Sync2Sheets lets you combine the power of Google Sheets and Notion. Now that you know the basic technique, you can easily adapt this method to scrape and analyze all kinds of structured web data.

Do you want us to write about a specific use case? Let us know on X!

Notion loves Google Sheets

Ready to get started?

Install Sync2Sheets, sync your Notion databases with Google Sheets and have the best of both worlds.

Google Workspace icon Install the Add-on

More interesting posts:

How to Supercharge Your Productivity Using These 10 Notion Integrations

How to Supercharge Your Productivity Using These 10 Notion Integrations

Discover how to supercharge your productivity with 10 recommended Notion integrations to automate work and boost productivity.

Read more
Export Notion Databases to Google Sheets: The Ultimate Guide

Export Notion Databases to Google Sheets: The Ultimate Guide

Export Notion databases to Google Sheets for seamless data analysis, collaboration, and backup. Optimize your workflow with Sync2Sheets.

Read more
Notion for Project Management: Tips, Use Cases, and Best Practices

Notion for Project Management: Tips, Use Cases, and Best Practices

Master project management with Notion: Explore essential features, collaboration tips, and best practices to boost productivity and streamline workflows.

Read more