← Back to Lessons

Cyber Intelligence and OSINT: Digital Surveillance with Open Sources

What is OSINT?

Cyber intelligence is a cybersecurity discipline focused on collecting, analyzing, and transforming data from cyberspace into useful knowledge to anticipate threats, make informed decisions, and protect digital assets. One of the most powerful and accessible methods for this is the use of OSINT (Open Source Intelligence): intelligence obtained from open and public sources.

In a digitalized world where people, organizations, and systems constantly leave traces, OSINT becomes a key tool to stay ahead of attacks, investigate malicious actors, or assess reputational risks. However, before understanding how OSINT is applied, it is essential to distinguish between three levels of information processing:

  • Data: Fragment without context. Example: an IP address or a username.
  • Information: Data organized with a purpose. Example: knowing that an IP is associated with suspicious activity.
  • Intelligence: Analyzed, validated, and contextualized information that supports decision-making. Example: attributing an IP to a specific threat actor.

The practice of cyber intelligence aims to turn large volumes of public data into actionable intelligence. To achieve this, the intelligence cycle is used, consisting of the following phases:

Phase 1: Planning and Direction

Every intelligence operation begins with a clear definition of objectives. What do you want to know? What is the potential threat or actor to investigate? This phase includes defining the scope, designing the collection strategy, and allocating resources. It also involves prioritizing objectives if working with multiple simultaneous focuses. A common mistake at this stage is starting collection without a concrete intelligence question, which can lead to an excess of irrelevant information.

Phase 2: Data Collection

Here, the necessary activities are carried out to obtain raw data from open sources (OSINT), internal, or technical sources. In the context of cyber intelligence, this may include: collecting data from social networks, DNS analysis, using advanced search engines, scanning devices exposed on Shodan, or reviewing leaks in public databases. This phase requires a balance between breadth and precision: obtaining enough without losing focus or generating unnecessary noise.

Phase 3: Analysis and Validation

Once the information is collected, the next step is to filter, compare, and validate the data. Not everything found online is reliable: analysts must verify the authenticity of sources, detect contradictions, eliminate duplicates, and contextualize the information. Here, raw data is transformed into structured information that allows identifying patterns, relationships between entities, and anomalies. This stage requires analytical judgment and technical expertise.

Phase 4: Intelligence Production

In this phase, the analyzed information is turned into useful and understandable intelligence for the recipients. This may involve technical reports, early warnings, visualizations (such as relationship maps or timelines), or even action recommendations. The production must be tailored to the user's context: a security analyst, an operations manager, or a legal team do not need the same depth or format.

Phase 5: Dissemination and Use

Finally, the produced intelligence must be delivered on time and in the appropriate format so that decision-makers can use it effectively. In the corporate environment, this can translate into actions such as blocking a malicious IP, alerting about a credential leak, tightening access policies, or even initiating legal proceedings. This phase also includes feedback: evaluating whether the delivered intelligence was useful and what adjustments should be made in the cycle for future investigations.

Properly applying the intelligence cycle allows transforming cyberspace noise into a key tool for digital defense.

What is OSINT?

OSINT (Open Source Intelligence) refers to the collection of information from open and legally accessible sources. These sources include: websites, blogs, public forums, social networks like Twitter, Facebook, LinkedIn, media outlets, news, public documents, official bulletins, government records, images, videos and their metadata, specialized search engines like Shodan, Censys, or advanced Google queries (Google Dorks).

Unlike a simple internet search, OSINT involves a systematic and disciplined methodology to collect, verify, and analyze useful information for cybersecurity or investigation purposes. An effective OSINT investigation is structured in stages that maintain focus, traceability, and validity of findings:

  1. Defining the objective: Clearly establish who or what will be investigated: a person, a company, a domain, an IP address, a specific account, etc.

  2. Selecting sources: Depending on the objective, choose the most relevant sources: social networks, public databases, DNS records, leaked password repositories, among others.

  3. Information collection: Use manual or automated tools to obtain data. This may include advanced search engines, OSINT frameworks, public APIs, and scrapers.

  4. Validation and cross-checking: The collected information must be cross-checked between different sources to confirm its authenticity, validity, and relevance. This stage is critical to avoid false positives or misinterpretations.

  5. Intelligence production and presentation: Findings are organized in a useful format for decision-making: reports, relationship maps, timelines, risk matrices, among others.

The following table shows examples of OSINT applications in cyber intelligence.

ApplicationExample of Use
Threat analysisIdentifying ransomware groups in underground forums
People investigationReconstructing the digital identity of an attacker
Passive reconnaissanceMapping domains, IPs, and technologies without interacting with systems
Leak verificationDetecting if credentials have been exposed in public databases
Fraud preventionDetecting fake profiles or identity impersonations

On the other hand, the effectiveness of an OSINT investigation largely depends on having the right tools to facilitate data collection, organization, and analysis. Below are some of the most commonly used tools by cyber intelligence analysts.

Common OSINT Tools

  • Maltego is one of the most comprehensive tools for visualizing relationships between entities. It allows you to create interactive graphs showing how IP addresses, domains, emails, people, organizations, and more are connected. It is especially useful for analyzing contact networks, an organization's technical infrastructure, or links between suspicious actors. Its strength lies in its transforms, modules that automate information searches in various sources (WHOIS, DNS, social networks, etc.). There is a free version (Maltego CE) with some limitations, and commercial versions that allow more extensive searches and team collaboration.

  • Spiderfoot is an automated framework for gathering intelligence about a target. It can be used via its web interface or in console mode, and allows massive scans with more than 200 modules. It is ideal for getting an overview of a domain, IP, username, or email address. It searches services like VirusTotal, Shodan, Pastebin, Have I Been Pwned, leak databases, among others, and helps identify potential vulnerabilities, exposed sensitive information, and correlations between multiple entities.

  • Shodan is a search engine specifically designed to index devices connected to the internet: routers, IP cameras, servers, databases, SCADA systems, among others. Unlike Google, which indexes websites, Shodan reveals publicly accessible services and technical metadata such as banners, software versions, open ports, and SSL certificates. It is a key tool for passive reconnaissance and assessing the exposure of a technological infrastructure without directly interacting with systems. It is also used in security audits, attack surface analysis, and to detect

  • theHarvester is a tool focused on collecting emails, subdomains, and DNS data related to a specific domain. It is especially useful in the early stages of a reconnaissance process, when you want to understand an organization's digital environment. It queries multiple sources like Google, Bing, DuckDuckGo, DNSDumpster, and can also integrate with Shodan or Censys. Its simplicity makes it a common choice in training labs and red teaming exercises.

  • Have I Been Pwned is an online service that allows you to check if an email address has been involved in any known data breach. It is one of the most used sources for discovering compromised credentials. Analysts can use it to assess the risk associated with a corporate domain (for example, by checking how many @company.com accounts have been leaked). It is also useful for raising user awareness and justifying the implementation of additional controls such as multi-factor authentication.

  • Google Dorks are advanced queries in the Google search engine that use special operators (site:, filetype:, intitle:, etc.) to find specific information that usually goes unnoticed with normal searches. They allow you to discover sensitive indexed documents (such as Excel, PDF, Word files), exposed login pages, open directories, misconfigurations, or even accessible security cameras. They are used in public visibility audits, compliance testing, and ethical pentesting exercises. Despite their apparent simplicity, when used well they can yield extremely revealing results.

The effectiveness of an OSINT operation depends not only on the tools used, but also on the methodological discipline and analytical judgment of the investigator. Even with experience and access to multiple sources, mistakes can be made that affect the validity of findings or compromise the integrity of the process. Below are some of the most frequent errors, along with their operational implications.

Common Mistakes in OSINT Investigations

  • Lack of a defined objective: One of the most common mistakes is starting an investigation without a clear question or without defining the scope of the objective. Searching for "everything about a company" or "all the digital activity of a person" can lead to an excessive accumulation of irrelevant data that hinders analysis.

    • Consequence: Wasted time, scattered resources, and difficulty extracting actionable conclusions.
    • Solution: Define specific intelligence questions from the start. For example: What public infrastructure does the domain company.com have? Is this Twitter user linked to a leak?
  • Relying on a single source: Every open source has limitations: it may be outdated, manipulated, or incomplete. Relying exclusively on a single database, forum, search engine, or platform is a methodological risk.

    • Consequence: Biased or outright incorrect conclusions are generated.
    • Solution: Corroborate each relevant finding in at least two independent sources. Triangulation of information is an essential practice in intelligence.
  • Misinterpreting context: A data point without context is a trap. An old post, a sarcastic comment, a parody account, or a misattributed IP address can lead to errors if nuances are not understood.

    • Consequence: The analysis loses accuracy and credibility. It can lead to erroneous accusations or failures in risk assessment.
    • Solution: Analyze the origin, timing, and tone of the content. When possible, locate the author and the digital environment where the interaction occurs.
  • Digital exposure of the investigator: Many OSINT tools require browsing websites, checking social networks, or searching public platforms. Doing so with your usual browser or from a personal account can leave digital traces that alert the target or compromise the analyst.

    • Consequence: Risk of being detected, access blocked, loss of anonymity, or even counterintelligence.
    • Solution: Use isolated environments (virtual machines, live operating systems), browsers configured for anonymity (such as Tor or Brave), and operational accounts created specifically for investigation.
    • Consequence: Possible legal consequences, reputational damage, or exclusion from professional investigations.
    • Solution: Know and respect the local and international legal framework (such as GDPR in Europe or Data Protection Laws in LATAM). Also, apply basic ethical principles of necessity, proportionality, and minimization of impact.

In summary, a rigorous OSINT investigation depends not only on technology, but on the analyst's judgment, preparation, and respect for the legal framework. Avoiding these mistakes not only improves the quality of the intelligence generated but also strengthens the legitimacy and sustainability of this practice within professional cybersecurity.

Practical Example: OSINT on a Domain

Doing OSINT on a domain like example.com means collecting public and legally accessible information about the digital assets associated with that domain, without actively interacting with its servers (which is considered passive reconnaissance).

The goal is to understand:

  • What infrastructure is exposed to the internet
  • What types of technologies are being used
  • If there is sensitive information or associated leaks
  • How easy it would be for an attacker to build an attack vector

This type of analysis is useful for risk assessment, visibility audits, or preparing red teaming and pentesting exercises.

Detailed Basic Actions

1. Query DNS Records and Subdomains (dnsdumpster.com)

This action seeks to identify all public subdomains associated with example.com, such as:

  • mail.example.com
  • login.example.com
  • dev.example.com

Each subdomain may point to a different server, with different services and protection levels. With dnsdumpster.com you can obtain: Known subdomains, associated IP addresses, hosting providers, and possible internal routes or test environments (often neglected). This allows you to visualize the attack surface map.

2. Collect Public Emails (theHarvester)

theHarvester automates the search for emails linked to the domain through search engines, social networks, and other indexed sources.

Importance:

  • Emails can be used in targeted attacks (phishing, spear phishing)
  • Reveal real names and internal roles (support@, juan.perez@, ceo@)
  • May be present in leaked databases

This helps identify vulnerable people or possible entry points for social engineering.

3. Check Exposed Devices on the Internet (Shodan)

Shodan is a search engine that shows which devices are publicly accessible associated with a domain or IP. Examples of what you can find: Misconfigured web servers, IP cameras accessible without authentication, exposed databases (e.g., MongoDB), and visible admin panels.

With a search like hostname:example.com you can detect:

  • Services on sensitive ports (21, 22, 3389, 9200…)
  • Exposed SSL certificates
  • Potentially vulnerable software versions

This allows you to assess the technical exposure level of a domain.

4. Check for Credential Leaks (Have I Been Pwned)

This service lets you know if emails from the domain have been compromised in data breaches. Example:

If juan.perez@example.com appears in a leak (Adobe, LinkedIn, Dropbox), it can be assumed that the password could be reused on other services. This allows you to assess risks such as:

  • Unauthorized access
  • Credential stuffing attacks

5. Use Google Dorks for Advanced Searches

Google allows you to locate indexed sensitive documents, find login pages or admin panels, and detect misconfigurations or visible internal routes, such as:

  • site:example.com filetype:pdf
  • intitle:"index of" site:example.com
  • inurl:admin site:example.com

All these actions mentioned above can be performed without actively interacting with the target's systems, that is, without scans, exploitation, or alerts. Even so, they allow you to build a clear picture of an organization's public security posture. This is the essence of passive reconnaissance: observing without intervening, collecting without invading.

This type of analysis is fundamental in cybersecurity audits, threat analysis, and offensive simulation exercises.