How to Use VBA to Pull Excel Data from Hyperlinks
In this article, we will look at how to extract data from a website using VBA. For this we will be using the navigate method on the InternetExplorer object. Let us see this in detail.
Say, you want to pull the current temperature of the city of Vancouver from this site. Before we begin our coding, we need to know the tag element that contains the required data in the web site.
To do that, navigate to the site in the browser. Right click on the data to be retrieved and click on inspect element.
The element will be highlighted in the Developer Tools window that will be opened.
So, we basically want to extract the inner text of a “p” tag with class “temperature”. This can be done as follows.
First of all, we need to make sure that the required libraries are included in your VBA project, namely, Microsoft HTML Object Library and Microsoft Internet Controls.
Now, let’s start with the code. Create a new InternetExplorer object using
Dim IE As New InternetExplorer
Specify the URL from which to extract data and navigate to it using
url = "https://www.theweathernetwork.com/ca/weather/british-columbia/vancouver" IE.navigate url
For troubleshooting purpose, you can make the browser window visible by using:
IE.Visible = True
After navigating to a URL, you need to wait till the web page is entirely loaded. This can be done by using
Do DoEvents Loop Until IE.readyState = READYSTATE_COMPLETE
(After the webpage is completely loaded, the readyState property of the InternetExplorer object is set to READYSTATE_COMPLETE)
Next, you save the contents of the webpage loaded using the document property
Dim Doc As HTMLDocument Set Doc = IE.document
You can then extract all elements of a particular type from the HTML document using
Dim tagElements As Object Set tagElements = Doc.all.tags("p")
To loop through all these “p” tag elements
Dim element As Object For Each element In tagElements 'Code to pull temperature goes here Next
From all the p tags, we are looking for the tag that contains temperature (°C) and has class name “temperature”. For this we use the element.innerText and element.className properties as follows
If InStr(element.innerText, "°C") And InStr(element.className, "temperature") Then Sheet1.Cells(1, 1).Value = element.innerText End If
Finally, we do the cleanup using
IE.Quit Set IE = Nothing
Here is the entire thing put together
Sub extractVancouverData() Dim IE As New InternetExplorer Dim url As String Dim item As HTMLHtmlElement Dim Doc As HTMLDocument Dim tagElements As Object Dim element As Object Dim lastRow Application.ScreenUpdating = False Application.DisplayAlerts = False Application.EnableEvents = False Application.Calculation = xlCalculationManual url = "https://www.theweathernetwork.com/ca/weather/british-columbia/vancouver" IE.navigate url IE.Visible = True Do DoEvents Loop Until IE.readyState = READYSTATE_COMPLETE Set Doc = IE.document lastRow = Sheet1.UsedRange.Rows.Count + 1 Set tagElements = Doc.all.tags("p") For Each element In tagElements If InStr(element.innerText, "°C") > 0 And InStr(element.className, "temperature") > 0 Then Sheet1.Cells(lastRow, 1).Value = element.innerText ' Exit the for loop once you get the temperature to avoid unnecessary processing Exit For End If Next IE.Quit Set IE = Nothing Application.ScreenUpdating = True Application.DisplayAlerts = True Application.EnableEvents = True Application.Calculation = xlCalculationAutomatic End Sub
You can also extract all elements from the HTMLDocument by the element’s ID, className, Name or tagName using the respective method as shown below
Set tagElements = Doc.getElementsByClassName("temperature") For Each element In tagElements If InStr(element.innerText, "°C") Then Sheet1.Cells(lastRow, 1).Value = element.innerText End If Next
Many times while extracting data from a website, you need to go to a page and click on a link as you cannot directly access the results using a URL. For this you can use the element.Click method. Say you click on the link “Excel” on the home page of this site. This is how you can do it
Sub clickOnLink() Dim IE As New InternetExplorer Dim str As String Dim Doc As HTMLDocument Dim tagElements As Object Dim element As HTMLObjectElement str = "https://software-solutions-online.com/" IE.navigate str IE.Visible = True Do DoEvents Loop Until IE.readyState = READYSTATE_COMPLETE Set Doc = IE.document 'First get all the links Set tagElements = Doc.all.tags("a") ' Loop through all the links For Each element In tagElements ' Look for the link that contains the text Excel If element.innerText = "Excel" Then 'Click on the link element.Click Do DoEvents Loop Until IE.readyState = READYSTATE_COMPLETE Exit For End If Next
If you run this code with the IE visible, you will see that the code clicks on the Excel link. Now you can extract the required data using the methods described above.