GetElementById in VBA

Visual Basic for Applications allows you to get data and information from websites. This is called ‘web scraping’. Web pages are written in a markup language called HTML. Anything that interests you in a webpage can be retrieved once we know which HTML element it exists in.

Elements can be identified in a number of ways, prominent among them is the Element Id. Element Ids provide a unique name that identifies a particular element. GetElementById is a method that can be used with a browser object in order to access the properties of the HTML element, on a particular webpage, where that element has the specified Id.

In this article, we will have a look at how to use GetElementById method with a browser object in your VBA code. Here is the syntax of the method:

expression. GetElementsById(id)

where:

id: The id of the HTML element you are retrieving. It is a String type value.

Once we are able to refer to the specific HTML element we are interested in, we can access a number of properties of this element. Among the most important of them are:

innerText or innerHTML: a String value that represent the displayed value of the element.

Children: a list object that represent the children nodes of the HTML element.

className: the HTML class name associated with the HTML element.

tagName: the tag name of the HTML element.

Contents

Setting up the environment

You will not be able to work with HTML documents or with the Internet Explorer browser without setting up the environment first. To be able to use our HTML object library and Internet Explorer browser with VBA, we need to add these references: ‘Microsoft Internet Control’ and ‘Microsoft HTML Object Library’. You can access the References list from the Tools menu in the VBA IDE.

Enable internet explorer in VBA references

Example 1: How to approach web scraping using GetElementById

When performing web scraping, our objective is to extract data that interests us from the webpage. We use the element’s ID in order to navigate our way to that specific data. For example, let’s say that we’re interested in getting the first header of a Wikipedia article. A perfect scenario would be if the data we are interested in is in an element that has an ID. We would simply use GetElementById to refer to that element and get its inner text with innerText or innerHTML.

scraping from wikipedia

The below code can be used to place the title of the Wikipedia article in cell ‘A1’ of the active worksheet.

Sub VBA_Webscrap()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate “https://en.wikipedia.org/wiki/Web_scraping”
'use the below set of Do's together to wait for the website to do work
Do While ie.readyState <> 3 
        If q < 100000 Then
            DoEvents
        Else
            Exit Do
        End If
    q = q + 1
Loop
Set html = ie.document
'get a reference to the article’s header. The text is stored in the first item within the element.
Set firstHeaderElement = html.getElementById("firstHeading")
ActiveSheet.Range(“A1”) = firstHeaderElement.innerHTML
Set html = Nothing
End Sub

The result can be seen in cell “A1”

excel with cell a1 showing the text "web scraping"

Example 2: Scraping jobs from a job search website

This is an extensive example that aims at providing a more complex demonstration of web scraping.

In this example, we are web scraping a job search site for VBA jobs in Romania. We utilize GetElementById when referring to the job details element. This is the element that contains a list of prerequisites and qualifications that the job applicants need to be aware of.

Through inspecting the HTML document, by right clicking on the web page and selecting ‘Inspect Element’ and finding the HTML lines that correspond to the job details element, we find that its ID is “job-details”.

We then look through each child of the element and get its text if it is a bullet point. This is not a general rule. But by observing the HTML document, I could see that the job requirements are organized into bullet points within the job details element.

Web scraping is all about finding patterns in the structure of the HTML document you are working with and capitalizing on the insights you gain with regards to the patterns you find in the organization of the data in the document. Note that to use this example you need to be already logged into LinkedIn as the code does not include the process of logging in.

Sub VBAJOBQuery()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "https://www.linkedin.com/jobs/search/?currentJobId=1781172247&f_TPR=r604800&geoId=106670623&keywords=vba&location=Romania"
'Wait until IE is done loading page
'use the below set of Do's together to wait for the website to do work
Do Until ie.readyState = 4  or ie.readyState = 3  
        If q <; 1000000 Then
            DoEvents
        Else
            Exit Do
        End If
    q = q + 1
Loop
Set html = ie.document
'get a reference to the job detail
Set jobDetailsList = html.getElementsByClassName("job-card-search__link-wrapper js-focusable disabled ember-view")
'Set spanItems = jobDetailsList.Children
RowNumber = ActiveSheet.Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row + 1
For Each JobItem In jobDetailsList
JobItem.Click
Cells(RowNumber, 1).Value = JobItem.innerHTML
'Here, we utilize GetElementById to refer to the job details element. This is the element that contains all the requirements for the job.
Set jobDetailsList = html.getElementById("job-details")
Set jobDetailsListChildren = jobDetailsList.Children
For Each spanItem In jobDetailsListChildren
If spanItem.tagName = "SPAN" Then
    'if this is the tag containing requirements and description details (li), process it
    Set ulElements = spanItem.Children
    For Each ulElement In ulElements
        If ulElement.tagName = "UL" Then
            Set liElements = ulElement.Children
                For Each liElement In liElements
                    
                    ActiveSheet.Cells(RowNumber, 2) = liElement.innerHTML
                    RowNumber = RowNumber + 1
                Next liElement
        End If
    Next ulElement
End If
Next spanItem
RowNumber = RowNumber + 1
Next JobItem
Set html = Nothing
End Sub 

The result of the above code is a list of job titles in column A and their corresponding job descriptions in column B

section 1 of scraped text
section 3 of scraped text
section 2 of scraped text

Note that in the above shown results, you can see that you would still need to get rid of some of the extra text such as <!—–> and <br>. Careful examination of the results and the ability to recognize the pattern in which these extra texts show up will enable you to add the logic to the code that can get rid of them. This is a common exercise in any web scraping activity.

Final Remarks

In this article, we covered two critical aspects of using GetElementById — accessing the properties of the element, chief among them innerHTML, and cycling through the children (internal objects) of an element. We applied a specific criteria and extracted the children that met that criteria.

Leave a Reply

Your email address will not be published. Required fields are marked *