π Introduction
A broken link is a hyperlink that does not work due to:
βοΈ The destination page being removed.
βοΈ Incorrect URL formatting.
βοΈ Server or permission issues.
When users click a broken link, they typically see a 404 error page or a server error message.
π How to Find All Links on a Web Page
Before checking broken links, we first need to extract all links from a webpage.
// Get all links (anchor tags) from the webpage
List<WebElement> links = driver.findElements(By.tagName("a"));
System.out.println("Total links found: " + links.size());
// Print each link's text and URL
for (WebElement link : links) {
String url = link.getAttribute("href");
System.out.println(url);
}
βοΈ This code finds all <a>
tags (anchor links) and extracts their href attribute, which contains the URL.
π How to Identify Broken Links Using Selenium and Java
Since Selenium cannot directly check if a link is broken, we use the HttpURLConnection class from Java's java.net
package to send an HTTP request to each URL and check the response.
β Steps to Check Broken Links:
- Get all links from the webpage using Selenium.
- Send an HTTP request to each link using
HttpURLConnection
. - Check the HTTP response code:
- 200-299 β β Valid Link
- 400-499 β β Broken Link (e.g., 404 Not Found)
- 500-599 β β Server Error
- Print broken links for reporting.
π Complete Selenium Code to Find Broken Links
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;
public class BrokenLinksChecker {
public static void main(String[] args) throws IOException {
// Set up WebDriver
WebDriver driver = new ChromeDriver();
driver.manage().window().maximize();
// Open the target website
driver.get("https://example.com"); // Replace with your website
// Get all links on the page
List<WebElement> links = driver.findElements(By.tagName("a"));
System.out.println("Total links found: " + links.size());
// Check each link
for (WebElement link : links) {
String url = link.getAttribute("href");
// Skip if URL is null or empty
if (url == null || url.isEmpty()) {
System.out.println("π΄ Empty or null link found");
continue;
}
// Check if the link is broken
verifyBrokenLink(url);
}
// Close the browser
driver.quit();
}
// Function to verify if a link is broken
public static void verifyBrokenLink(String linkUrl) {
try {
// Create a URL object
URL url = new URL(linkUrl);
// Open a connection
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
httpConn.setRequestMethod("HEAD"); // Use HEAD to check the response
httpConn.connect();
// Get response code
int responseCode = httpConn.getResponseCode();
// Print status
if (responseCode >= 400) {
System.out.println("β Broken Link: " + linkUrl + " β HTTP Response: " + responseCode);
} else {
System.out.println("β
Valid Link: " + linkUrl + " β HTTP Response: " + responseCode);
}
httpConn.disconnect();
} catch (Exception e) {
System.out.println("β οΈ Error checking link: " + linkUrl + " β " + e.getMessage());
}
}
}
π Understanding the Code
βοΈ Extracts all links from the webpage.
βοΈ Sends an HTTP request to each link using HttpURLConnection
.
βοΈ Checks the response code:
- β
200-299
β Valid link - β
400+
β Broken link
βοΈ Handles exceptions to avoid crashes.
βοΈ UsesHEAD
request instead ofGET
to improve performance (as it doesn't load the entire page).
π― Sample Output
Total links found: 10
β
Valid Link: https://example.com/home β HTTP Response: 200
β
Valid Link: https://example.com/about β HTTP Response: 200
β Broken Link: https://example.com/old-page β HTTP Response: 404
β Broken Link: https://broken-link.com β HTTP Response: 500
β
Valid Link: https://example.com/contact β HTTP Response: 200
π Best Practices
βοΈ Run the script periodically to monitor broken links.
βοΈ Store results in a report instead of printing them in large applications.
βοΈ Use parallel execution to speed up testing (with multiple threads).
βοΈ Ignore third-party links if they are outside your website domain.
π₯ Conclusion
β
Selenium extracts all links.
β
Java HttpURLConnection
checks each link.
β
Broken links are identified by HTTP response codes.