GetElementById in VBA
Visual Basic for Applications allows you to get data and information from websites. This is called ‘web scraping’. Web pages are written in a markup language called HTML. Anything that interests you in a webpage can be retrieved once we know which HTML element it exists in.
Elements can be identified in a number of ways, prominent among them is the Element Id. Element Ids provide a unique name that identifies a particular element. GetElementById
is a method that can be used with a browser object in order to access the properties of the HTML element, on a particular webpage, where that element has the specified Id.
In this article, we will have a look at how to use GetElementById method with a browser object in your VBA code. Here is the syntax of the method:
expression. GetElementsById(id)
where:
id: The id of the HTML element you are retrieving. It is a String type value.
Once we are able to refer to the specific HTML element we are interested in, we can access a number of properties of this element. Among the most important of them are:
innerText or innerHTML: a String value that represent the displayed value of the element.
Children: a list object that represent the children nodes of the HTML element.
className: the HTML class name associated with the HTML element.
tagName: the tag name of the HTML element.
Contents
Setting up the environment
You will not be able to work with HTML documents or with the Internet Explorer browser without setting up the environment first. To be able to use our HTML object library and Internet Explorer browser with VBA, we need to add these references: ‘Microsoft Internet Control’ and ‘Microsoft HTML Object Library’. You can access the References list from the Tools menu in the VBA IDE.
Example 1: How to approach web scraping using GetElementById
When performing web scraping, our objective is to extract data that interests us from the webpage. We use the element’s ID in order to navigate our way to that specific data. For example, let’s say that we’re interested in getting the first header of a Wikipedia article. A perfect scenario would be if the data we are interested in is in an element that has an ID. We would simply use GetElementById to refer to that element and get its inner text with innerText or innerHTML.
The below code can be used to place the title of the Wikipedia article in cell ‘A1’ of the active worksheet.
Sub VBA_Webscrap() 'to refer to the running copy of Internet Explorer Dim ie As InternetExplorer 'to refer to the HTML document returned Dim html As HTMLDocument 'open Internet Explorer in memory, and go to website Set ie = New InternetExplorer ie.Visible = True ie.navigate “https://en.wikipedia.org/wiki/Web_scraping” 'use the below set of Do's together to wait for the website to do work Do While ie.readyState <> 3 If q < 100000 Then DoEvents Else Exit Do End If q = q + 1 Loop Set html = ie.document 'get a reference to the article’s header. The text is stored in the first item within the element. Set firstHeaderElement = html.getElementById("firstHeading") ActiveSheet.Range(“A1”) = firstHeaderElement.innerHTML Set html = Nothing End Sub
The result can be seen in cell “A1”
Example 2: Scraping jobs from a job search website
This is an extensive example that aims at providing a more complex demonstration of web scraping.
In this example, we are web scraping a job search site for VBA jobs in Romania. We utilize GetElementById
when referring to the job details element. This is the element that contains a list of prerequisites and qualifications that the job applicants need to be aware of.
Through inspecting the HTML document, by right clicking on the web page and selecting ‘Inspect Element’ and finding the HTML lines that correspond to the job details element, we find that its ID is “job-details”.
We then look through each child of the element and get its text if it is a bullet point. This is not a general rule. But by observing the HTML document, I could see that the job requirements are organized into bullet points within the job details element.
Web scraping is all about finding patterns in the structure of the HTML document you are working with and capitalizing on the insights you gain with regards to the patterns you find in the organization of the data in the document. Note that to use this example you need to be already logged into LinkedIn as the code does not include the process of logging in.
Sub VBAJOBQuery() 'to refer to the running copy of Internet Explorer Dim ie As InternetExplorer 'to refer to the HTML document returned Dim html As HTMLDocument 'open Internet Explorer in memory, and go to website Set ie = New InternetExplorer ie.Visible = True ie.navigate "https://www.linkedin.com/jobs/search/?currentJobId=1781172247&f_TPR=r604800&geoId=106670623&keywords=vba&location=Romania" 'Wait until IE is done loading page 'use the below set of Do's together to wait for the website to do work Do Until ie.readyState = 4 or ie.readyState = 3 If q <; 1000000 Then DoEvents Else Exit Do End If q = q + 1 Loop Set html = ie.document 'get a reference to the job detail Set jobDetailsList = html.getElementsByClassName("job-card-search__link-wrapper js-focusable disabled ember-view") 'Set spanItems = jobDetailsList.Children RowNumber = ActiveSheet.Cells(ActiveSheet.Rows.Count, 1).End(xlUp).Row + 1 For Each JobItem In jobDetailsList JobItem.Click Cells(RowNumber, 1).Value = JobItem.innerHTML 'Here, we utilize GetElementById to refer to the job details element. This is the element that contains all the requirements for the job. Set jobDetailsList = html.getElementById("job-details") Set jobDetailsListChildren = jobDetailsList.Children For Each spanItem In jobDetailsListChildren If spanItem.tagName = "SPAN" Then 'if this is the tag containing requirements and description details (li), process it Set ulElements = spanItem.Children For Each ulElement In ulElements If ulElement.tagName = "UL" Then Set liElements = ulElement.Children For Each liElement In liElements ActiveSheet.Cells(RowNumber, 2) = liElement.innerHTML RowNumber = RowNumber + 1 Next liElement End If Next ulElement End If Next spanItem RowNumber = RowNumber + 1 Next JobItem Set html = Nothing End Sub
The result of the above code is a list of job titles in column A and their corresponding job descriptions in column B
Note that in the above shown results, you can see that you would still need to get rid of some of the extra text such as <!—–> and <br>. Careful examination of the results and the ability to recognize the pattern in which these extra texts show up will enable you to add the logic to the code that can get rid of them. This is a common exercise in any web scraping activity.
Final Remarks
In this article, we covered two critical aspects of using GetElementById — accessing the properties of the element, chief among them innerHTML, and cycling through the children (internal objects) of an element. We applied a specific criteria and extracted the children that met that criteria.