πŸ“Œ Introduction

A broken link is a hyperlink that does not work due to:
βœ”οΈ The destination page being removed.
βœ”οΈ Incorrect URL formatting.
βœ”οΈ Server or permission issues.

When users click a broken link, they typically see a 404 error page or a server error message.


πŸš€ How to Find All Links on a Web Page

Before checking broken links, we first need to extract all links from a webpage.

// Get all links (anchor tags) from the webpage
List<WebElement> links = driver.findElements(By.tagName("a"));

System.out.println("Total links found: " + links.size());

// Print each link's text and URL
for (WebElement link : links) {
    String url = link.getAttribute("href");
    System.out.println(url);
}

βœ”οΈ This code finds all <a> tags (anchor links) and extracts their href attribute, which contains the URL.


πŸ” How to Identify Broken Links Using Selenium and Java

Since Selenium cannot directly check if a link is broken, we use the HttpURLConnection class from Java's java.net package to send an HTTP request to each URL and check the response.

βœ… Steps to Check Broken Links:

  1. Get all links from the webpage using Selenium.
  2. Send an HTTP request to each link using HttpURLConnection.
  3. Check the HTTP response code:
    • 200-299 ➝ βœ… Valid Link
    • 400-499 ➝ ❌ Broken Link (e.g., 404 Not Found)
    • 500-599 ➝ ❌ Server Error
  4. Print broken links for reporting.

πŸ›  Complete Selenium Code to Find Broken Links

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;

public class BrokenLinksChecker {
    public static void main(String[] args) throws IOException {
        // Set up WebDriver
        WebDriver driver = new ChromeDriver();
        driver.manage().window().maximize();
        
        // Open the target website
        driver.get("https://example.com");  // Replace with your website
        
        // Get all links on the page
        List<WebElement> links = driver.findElements(By.tagName("a"));
        System.out.println("Total links found: " + links.size());

        // Check each link
        for (WebElement link : links) {
            String url = link.getAttribute("href");

            // Skip if URL is null or empty
            if (url == null || url.isEmpty()) {
                System.out.println("πŸ”΄ Empty or null link found");
                continue;
            }

            // Check if the link is broken
            verifyBrokenLink(url);
        }

        // Close the browser
        driver.quit();
    }

    // Function to verify if a link is broken
    public static void verifyBrokenLink(String linkUrl) {
        try {
            // Create a URL object
            URL url = new URL(linkUrl);

            // Open a connection
            HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
            httpConn.setRequestMethod("HEAD");  // Use HEAD to check the response
            httpConn.connect();

            // Get response code
            int responseCode = httpConn.getResponseCode();

            // Print status
            if (responseCode >= 400) {
                System.out.println("❌ Broken Link: " + linkUrl + " β†’ HTTP Response: " + responseCode);
            } else {
                System.out.println("βœ… Valid Link: " + linkUrl + " β†’ HTTP Response: " + responseCode);
            }

            httpConn.disconnect();
        } catch (Exception e) {
            System.out.println("⚠️ Error checking link: " + linkUrl + " β†’ " + e.getMessage());
        }
    }
}

πŸ” Understanding the Code

βœ”οΈ Extracts all links from the webpage.
βœ”οΈ Sends an HTTP request to each link using HttpURLConnection.
βœ”οΈ Checks the response code:

  • βœ… 200-299 β†’ Valid link
  • ❌ 400+ β†’ Broken link
    βœ”οΈ Handles exceptions to avoid crashes.
    βœ”οΈ Uses HEAD request instead of GET to improve performance (as it doesn't load the entire page).

🎯 Sample Output

Total links found: 10
βœ… Valid Link: https://example.com/home β†’ HTTP Response: 200
βœ… Valid Link: https://example.com/about β†’ HTTP Response: 200
❌ Broken Link: https://example.com/old-page β†’ HTTP Response: 404
❌ Broken Link: https://broken-link.com β†’ HTTP Response: 500
βœ… Valid Link: https://example.com/contact β†’ HTTP Response: 200

πŸ† Best Practices

βœ”οΈ Run the script periodically to monitor broken links.
βœ”οΈ Store results in a report instead of printing them in large applications.
βœ”οΈ Use parallel execution to speed up testing (with multiple threads).
βœ”οΈ Ignore third-party links if they are outside your website domain.


πŸ”₯ Conclusion

βœ… Selenium extracts all links.
βœ… Java HttpURLConnection checks each link.
βœ… Broken links are identified by HTTP response codes.