how to find broken links using selenium

Why you should check Broken links?

How to find broken links using Selenium? The first thing to remember is you should always make sure none of the links on your webpage is broken and users don’t get to land on an error page.

For that reason, verification of links is one of the most common testing practices which is done by opening each link and ensuring that each link is working correctly.

This testing is generally performed when a new build is deployed on the server and is done by Quality Analyst by clicking on each link and verifying whether each & every link is working correctly or not so that the user lands on the correct page.

And if, there is any 404 or 505 error code or not? Also, ensure that the server response code is 200.

But, this testing is very monotonous and boring in nature. and so here we come at the solution provided by automation testing & this scenario is quite suitable for Automation.

How to find broken links using Selenium:

In the long run, this tutorial will help you by providing step-by-step guidance, help you to understand the approach, and provide assistance in coding. Refer below for step by step explanation:

Pre requisite:

  • Create Selenium Project.

How to find ALL links to a webpage in selenium?

  • Step 1: Approach to find all links on the Webpage: As we know, all the URLs are kept under anchor tag in HTML, and URL value kept href attribute.

Example (Format in which links are stored on webpage):

<a id="js-link-box-es" href="//es.wikipedia.org/" title="Español — Wikipedia — La enciclopedia libre" class="link-box" data-slogan="La enciclopedia libre">
<strong>Español</strong>
<small>1 588 000+ artículos</small>
</a>
Wikipedia Web page links
  • Step 2: Selenium provides method findElements that returns all the WebElements of the webpage in List. Refer below code:

List allLinks= driver.findElements(By.tagName(“a”));

So we have stored all web page links in all links list as a variable that comes under the anchor tag. Refer below for the complete code:

package stepDef;
import io.github.bonigarcia.wdm.WebDriverManager;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.Test;
import java.util.List;
import java.util.concurrent.TimeUnit;
public class VerifyPageLinks {
  @Test
  public void verifyBrokenLinks(){
    WebDriverManager.chromedriver().version("79.0").setup();
    WebDriver driver = new ChromeDriver();
    driver.get("https://www.wikipedia.org/");
    driver.manage().window().maximize();
    driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
    // Store all the Links on Webpage in List
    List allLinks= driver.findElements(By.tagName("a"));
    System.out.println("Total Number of Links: "+allLinks.size());
    for(WebElement link : allLinks){
      System.out.println("Link Text:"+ link.getAttribute("href"));
    }
    driver.close();
  	}
	}

Console output:
Total Number of Links: 323

Link Text:https://en.wikipedia.org/ > Link Text:https://es.wikipedia.org/ > Link Text:https://ja.wikipedia.org/ > Link Text:https://de.wikipedia.org/

How to validate all links of the webpage?

  • Step 3: Approach to Test Valid link
  • Test 1: URL text should not be blank: As we know URL text comes under href attribute. So we can get this text using Selenium getText() method and verify.
// As we we want to perform whole test for all the links. So we we will use Soft Assert
SoftAssert soft = new SoftAssert();
for (WebElement link : allLinks) {
String actualLinkText = link.getText();
System.out.println("Link Text" + actualLinkText);
System.out.println("URL within HREF: " + link.getAttribute("href"));
//actualLinkText should not null. So use TestNG method assertNotNull for assertions
soft.assertNotNull(actualLinkText);
}
soft.assertAll()
  • Test 2: Test Broken Link: When sending a request with the “GET” method on a server it returns response code “200” in case of a valid URL. To get a response code from the server we will use and request with “GET” and get a server response. HttpURLConnection abstract class provides an amazing collection of methods that help us to make the request over the network.

Here is the code to get a response from the server:

@Test
public void brokenLink() {
String urForTest = "https://www.wikipedia.org/";
try {
URL url = new URL(urForTest); HttpURLConnection httpConnection = (HttpURLConnection) url.openConnection();
// Set request with “GET”Method
httpConnection.setRequestMethod("GET");
httpConnection.connect();
int serverResponseCode = httpConnection.getResponseCode();
System.out.println("Server Response Code: " + serverResponseCode);
} catch (Exception e) {
e.printStackTrace();
}
}

Console output:
Server Response Code: 200

  • Step 4: Make one reusable method– getResponseCode to get a response from the server for any URL. (Use Broken Link test code and make generic method)
  • Step 5: Add the below assertions in the test
// Test Case 2: Verify response code
String urlForTest= link.getAttribute("href");
int responseCode =getResponseCode(urlForTest);
System.out.println("Link: "+urlForTest+"Response From Server: "+responseCode);
soft.assertEquals(responseCode, 200, "Testing Response of URL");

Code :

package stepDef;
import io.github.bonigarcia.wdm.WebDriverManager;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.Test;
import org.testng.asserts.SoftAssert;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;
import java.util.concurrent.TimeUnit;
public class VerifyPageLinks {
@Test
public void verifyBrokenLinks() {
// WebDriver Manager used to download WebDriver binaries automatically
WebDriverManager.chromedriver().version("79.0").setup();
WebDriver driver = new ChromeDriver();
driver.get("https://www.wikipedia.org/");
driver.manage().window().maximize();
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
// store dropdown as Webelement
List allLinks = driver.findElements(By.tagName("a"));
System.out.println("Total Number of Links: " + allLinks.size());
  
SoftAssert soft = new SoftAssert(); 
  for (WebElement link : allLinks) { 
    String actualLinkText = link.getText(); 
    System.out.println("Link Text" + actualLinkText); 
    System.out.println("URL within HREF: " + link.getAttribute("href")); 
    //Test Case 1: //actualLinkText should not null. So use TestNG method assertNotNull for assertions 
    soft.assertNotNull(actualLinkText); 
    // Test Case 2: Verify response code 
    String urlForTest= link.getAttribute("href"); 
    int responseCode =getResponseCode(urlForTest); 
    System.out.println("Link: "+urlForTest+"Response From Server: "+responseCode); 
    soft.assertEquals(responseCode, 200, "Testing Response of URL"+urlForTest); 
  	} 
  	soft.assertAll(); driver.close(); 
	} 
  	@Test public void brokenLink() { 
    String urForTest = "https://www.wikipedia.org/"; 
      try { 
        URL url = new URL(urForTest); 
        HttpURLConnection httpConnection = (HttpURLConnection) url.openConnection(); 
        httpConnection.setRequestMethod("GET"); 
        httpConnection.connect(); 
        int serverResponseCode = httpConnection.getResponseCode(); 
        System.out.println("Server Response Code: " + serverResponseCode); 
      	} catch (Exception e) { 
        e.printStackTrace(); 
      	} 
    	} 
  		public int getResponseCode(String urForTest){ 
          int serverResponseCode=0; 
          try { 
            URL url = new URL(urForTest); 
            HttpURLConnection httpConnection = (HttpURLConnection) url.openConnection(); 
            httpConnection.setRequestMethod("GET"); 
            httpConnection.connect(); 
            serverResponseCode = httpConnection.getResponseCode(); 
            System.out.println("Server Response Code: " + serverResponseCode); 
          	} catch (Exception e) { 
            e.printStackTrace(); 
          	} 
          	return serverResponseCode; 
        	}
			}

Console:
URL within HREF: https://ja.wikipedia.org/
Server Response Code: 200
Link: https://ja.wikipedia.org/Response From Server: 200
Link TextDeutsch
2 416 000+ Artikel
URL within HREF: https://de.wikipedia.org/
Server Response Code: 200

Read this article: Install & Setup Selenium step by step tutorial

Wonderful! In the hope that you have completed the broken link validation testing code and now you can test any webpage broken link test. Hope this article helps you in writing code and giving additional information. To this end, feel free to reach out to us at query@thoughtcoders.com. Please also join us on Facebook and LinkedIn for more updates on technology.

Reference: https://docs.oracle.com/javase/8/docs/api/java/net/HttpURLConnection.html