User’s Guide

Welcome to Bulk Verifier, an efficient multi-threaded high speed verifier application for checking e-mail addresses and domain availability. This advanced email verifier checks every email address from a given mailing list, allowing you to determine if they still exist.

Bulk Verifier offers you two processing modes – fast and deep to clean and validate email list and domains.

In its fast mode this Fast Email Verifier works extremely fast being able to process mailing lists containing dozens of millions of e-mail addresses at a speed of several thousands addresses per second. This mode does not ensure the highest accuracy of checking but is optimal by expended time and traffic and provides quite sufficient results. We recommend the fast processing mode of Bulk Verifier as a high-speed tool for sifting obvious rubbish out of large mailing lists containing millions of e-mail addresses. For the details please see the section “Fast mode of this high speed Bulk Verfier”.

In its deep (default) mode Bulk Verifier works significantly slower but provides much more precise results. Optimal data amount for this mode is 70...100 thousand e-mail addresses. We recommend the deep processing mode of Bulk Verifier as a slow but high-quality tool for checking of not very large mailing lists. For the details please see the section “Deep (slow) mode of Bulk Verifier”.

1 Introduction to e-mailing technologies

There are 2 stages in e-mail message delivery to the addressee:

  • 1 The sender’s mail server determines the addressee’s mail server using DNS service;
  • 2 The sender’s mail server connects to the addressee’s mail server via the SMTP protocol and transmits the message.

A mail domain (e. g. mail.com for the address nicky@mail.com) is usually different from the name of the mail server which receives e-mail messages for the address. For example, by the moment of this Guide creation the servers mail-com.mr.outblaze.com and mail-com-bk.mr.outblaze.com accept messages for the address nicky@mail.com while the computers with the addresses mail.com and www.mail.com do not accept messages for any e-mail addresses at all. That’s why you should not associate directly an e-mail domain with the name of the mail server, since messages are often accepted by another computer with absolutely different name.

To determine the addressee’s server address the request is sent to the DNS service which stores (together with other data) the information about the correspondence between mail services and mail domains which receive messages for them. DNS is a distributed database. For example, the DNS server ns1.outblaze.com stores all information about the domain mail.com but doesn’t have any information about other domains (e. g. hotmail.com). At the same time the server ns1.hotmal.com has information about the domain hotmil.com but no data about other domains. There is a server responsible for domains in the .com zone which stores the information about domains of this zone.

DNS server of your provider does not contain any records about mail.com or hotmail.com. When it receives a request sent by, for example, mail.com, it will ask the domain responsible for the zone .com for the address of the server containing the information about the domain mail.com (it is ns1.outblaze.com), then connect to this server and send back a response for you. Such request execution is called recursive.

DNS technologies are described in details in many public sources and are not the subject of this Guide. What is important to know is that the request to a DNS service can pass through several DNS servers in different areas before you get the response. And the responsible for information storage about a certain domain is the owner of the domain.

There is also a technology of DNS requests caching. Usually a DNS server stores the results of latest requests for several days to decrease the load on DNS servers and speed up requests execution. This means that in case of some unforeseen changes in a DNS server records it may take several days before the caches of other DNS servers will be refreshed to provide their users with the updated information.

2 E-mail addresses check technologies

As it was already said above, there are 2 stages in e-mail message delivery to the addressee:

  • 1 The sender’s mail server determines the addressee’s mail server using DNS service;
  • 2 The sender’s mail server connects to the addressee’s mail server via the SMTP protocol and transmits the message.

To check an e-mail address availability, it’s necessary to emulate these stages. The problem is that some mail services do not check the addressees’ e-mail addresses (mail boxes) actual existence in their domains when accepting incoming mail. All messages are accepted and then, if an address does not exist in fact, the mail service just sends the original message’s sender a response containing a delivery failure message. The number of e-mail addresses which belong to such mail services is about 30% of all e-mails. Their availability cannot be checked using software methods. Thus, only about 70% of unavailable e-mail addresses can be determined with the help of software tools.

In its turn, about 30% of unavailable addresses which can be determined with domain or email validation software tools, are discovered on the first checking stage (DNS request) and to discover the other 70% the 2nd stage is necessary (SMTP connection emulation). The 2nd checking stage usually takes 10 times more time and 5 times more network traffic then the 1st one. In fact, the complete two-stage check of an e-mail address existence takes the same time and traffic as sending a short message to this address.

Let’s look at the check stages in more details.

Stage 1 The verify maillist software parses the e-mail address syntactically, singles out the mail domain and sends a request to the DNS server to get the mail server of this domain. During the exchange with the DSN serves the UDP protocol is used which is faster then TCP because doesn’t involve connection establishment between the servers. Usually it takes 1-2 seconds to request a DNS server. This includes sending a request package (about 60 bytes including the package header) and accepting a response package (usually 200-300 bytes but not more than 512). This stage filters out all syntactically incorrect e-mails as well as e-mails in non-existent domains.

Note. The syntactical check performed by Email Verifier is a very simple one: e-mail address must include one “@” sign and must end with one of the basic top-level domains (TLD). TLDs list is stored in the file “Bulk Verifier.tld” in the application’s main folder. More precise syntactical check seems to be not quite reasonable since it will slow down the processing.

Stage 2 The checking software establishes connection to the mail server via the SMTP protocol (based on TCP). The TCP protocol is connection-oriented, so the servers dispatch service packages to establish the connection.

After the connection is established, the servers exchange “hello messages” (the first lines in the log below). Then the sender’s address is transmitted and the receiving server submits the message from this address to be accepted. Then the addressee’s address is transmitted.

Here is a log example:

< 220-ns.watson.ibm.com ESMTP Sendmail AIX4.3/8.9.3/8.9.0
< 220 Thu, 22 Aug 2002 20:44:07 +0500
> HELO cisco.my.net
< 250-ns.watson.ibm.com Hello cisco.my.net [12.44.72.94],
< 250 pleased to meet you
> MAIL FROM:<verify@testmail.com>
< 250 <verify@testmail.com>... Sender is valid.
> RCPT TO:<noshuchaddress@ibm.com>
< 550 <noshuchaddress@ibm.com>... User unknown
> RSET
< 250 Resetting the state.
> QUIT

As you can see, the receiving server responded that the user with the address noshuchaddress@ibm.com is unknown and refused to receive a message for this user. Then the servers exchanged commands to close the connection.

During the address check the servers exchanged 10 messages with the total size of about 500 bytes. But in fact it took more than 20 packages to deliver these messages which led to the total expended traffic of about 2 KBytes. At that most of the time was spent waiting the response from the other server.

Email Verifier can perform for you both complete (but slow) two-stage check of e-mail addresses availability and a high-speed check which involves only the 1st stage (DNS server request). For the details please see the sections Fast mode and Deep (slow) mode of Bulk Domain / Email Validator. This Email Verifier is a kind of software to verify email addresses and clean the mailing list from dead addresses.

3 General Bulk Verifier features – clean and validate your email list.

3.1 Incoming file formats

Email Verifier is a powerful e-mail checking tool to verify your customers e-mail addresses from your mailbox or contact files. It can process both plain list of e-mail addresses / domains where each line contains one item and files of more complex structure where lines represents multi-field records of the same structure (i. e. containing the same fields separated with the same delimiter). For example, you can export a worksheet of an MS Excel file to check availability of e-mail addresses/domains listed there. It’s supposed that one line of an incoming file contains one e-mail address and/or one domain. This Email Verifier can perform several checks against an email address including syntax, dns MX lookup, top level domain name validation, etc.

3.2 Bulk Verifier internal cache

Bulk Verifier stores domain check results in the internal cache. If another e-mail address from the same domain will be found in the same mailing list, Bulk Verifier will not request the DNS server once again but will use the result from the cache. Cache size is limited only by the memory size of your computer. It takes 40 bytes of memory to store the result of one domain check. Thus, it will take 40 MBytes of memory to store the results of check of one million different domains. The time spent to find a previous check result in the cache practically does not depend on the cache size.

3.3 Timeouts

The quality of DNS servers list used by Bulk Verifier (Options\DNS) also influences deeply the application performance. If Bulk Verifier does not receive a response from a DNS server in a specified period of time (Options\Timeout, in seconds), it makes new attempts using another DNS service from the list each time. If all these attempts failed, the e-mail address is listed as not checked due to the connection timeout. The bigger the list of DNS servers which can be used by E-mail Verifier, the less is the probability that a couple of DNS servers which have operating problems will affect the application’s performance.

3.4 Multithread processing

Bulk Verifier is a multi-thread application. You can define up to 600 threads which will be used simultaneously (one thread is used to check one e-mail address/domain).

Please note that using the maximal number of threads is not always the best choice. For example, if you use 600 threads, the application checks 600 domains at the same time sending up to 15 000 requests for DNS servers per minute. At that the traffic may amount to 700 kbps. A DNS server’s software may regard this as a hackers attack and block you up.

It is also possible that DNS server can process only a certain limited amount of requests per second from the same address ignoring the rest of requests to ensure other users have enough resources to work with the server. In this case the application productivity will decrease significantly since some addresses will be checked repeatedly because previous attempts to check them were unsuccessful due to timeouts.

Thus, if your network connection is capable to provide the work of more than 50 threads, you should adjust your Bulk Verifier parameters as about one DNS server (Options\DNS) per each ten threads. In this case you can be sure that servers will not fail because of the overload.

Multithread applications work in different ways with different operation systems of Windows family. Windows XP perfectly copes with 600 processing threads; at that the processor load increases insignificantly. Older operation systems (e. g. Windows’98, Windows NT4) are more sensitive to big threads number and even a hundred of threads may lead to a considerable processor load. We recommend you to use Bulk Verifier on computers controlled by Windows XP to reach the application’s maximal performance.

4 Fast mode of Bulk Verifier

In this mode this High Speed Verifier is able to process mailing lists containing dozens of millions of e-mail addresses at a speed of several thousands addresses per second. To switch to this mode please UNcheck the option “Advanced e-mail check using SMTP” in the Bulk Verifier Options dialog (see also the section “Bulk Verifier interface and options”).

Working in the fast mode, Bulk Verifier determines about 25-30% of unavailable e-mail addresses in a mailing list. These figures may seem weak since theoretically up to 70% of unavailable e-mails can be determined in a list using software methods, but in fact these 30% can amount to 10% of the whole mailing list, which is quite significant.

More precise check which allows to define another 40% of unavailable e-mails is still available in the List check deep mode. But you should realize that the deep check may sometimes take 10 times more time and 5 times more network traffic, which often makes its use not quite reasonable for large e-mail lists.

In the fast mode, Bulk Verifier uses the stage of DNS requests to check e-mail addresses availability. During an e-mail address availability check the following actions are executed:

  • 1 Bulk Verifier parses the address syntactically and singles out its mail domain.
  • 2 The top-level domain is singled out from the mail domain (e. g. .com for the mail domain mail.com).
  • 3 Bulk Verifier compares the top-level domain with the basic top-level domains list stored in the application’s main folder (the file Bulk Verifier.tld).

    If the initial e-mail address is syntactically incorrect or its top-level domain was not found in the file Bulk Verifier.tld, the address is regarded as invalid. The further processing is not performed for this address.
  • 4 Bulk Verifier requests the DNS server for the mail server address of the mail domain. If the DNS server returns one or more addresses of mail servers which accept mail for the domain, the initial e-mail address is considered available and valid. If the address was not found by the DNS server at all or there are no mail servers which accept mail for the domain, the initial e-mail address is considered invalid. If the DNS server could not return a response because DNS servers serving the mail domain were unavailable, the initial address is considered invalid.

5 Deep (slow) mode of Bulk Verifier

In the deep (slow) mode Bulk Verifier performs a complete two-stage check of e-mail addresses availability. To switch to this mode please check the option “Advanced e-mail check using SMTP” in the Bulk Verifier Options dialog (see also the section “Bulk Verifier interface and options”).

The first stage of the check is absolutely the same as the one used by the fast mode of Bulk Verifier: the application extracts the mail server address of an e-mail address out of DNS. Please see the section “Fast mode of Bulk Verifier” for more details.

If the mail server address is extracted successfully, the second processing stage starts. Bulk Verifier attempts to connect to this mail server and emulate a message dispatch.  No message is actually sent during the e-mail availability check. Bulk Verifier establishes the connection with the mail server, sends a “hello message”, transmits the sender’s address (Options\Sender) to pretend there is a message and then transmits the addressee’s mail box address (an e-mail address being checked). As soon as the receiving server confirms or denies the requested mail box availability, Bulk Verifier disconnects.

6 Bulk Verifier interface and options

Bulk Verifier interface is simple and intuitive. There are two windows in the application: main window and Options dialog.

In Bulk Verifier main window you can indicate the following parameters:

  • Input file – input file name
  • Output folder – the folder to place resulting files (see the section “Processing results”). The session file is placed into the same folder automatically.
  • Mode – the input file format:
    • Plain list of e-mails or domains. The input file is a simple list of e-mail addresses or domains where each line contains one item.
    • CSV export from the database. The input file has a complex structure where lines represent multi-field records of the same structure (i. e. containing the same fields separated with the same delimiter). For example, you can export a worksheet of an MS Excel file to check availability of e-mail addresses/domains listed there.

      For this mode you can indicate the following additional parameters:
      • With headers. The first line of the file contains field headers.
      • Delimiter. The fields within each line are separated with the specified symbol.
      • Domains in column / E-mails in column. Header or first element of fields containing domains/e-mails. If the delimiter was specified correctly, you will be able to select necessary values in the drop-down lists.
      The section Statistics will show you current processing results.
      The section Log is used to reflect processing progress; the log-file can be also created in the specified path. These features slow down the processing, so they are disabled by default. To enable them check the option Enable above the log window.

To set DNS, e-mail, proxy and other parameters open the Options dialog by pressing the button “Options” in the main window toolbar.

The following fields are available here:

  • DNS – the address(es) of DNS server(s) you want to use during the check. Here you can indicate several addresses (each on a separate line). If the first DNS server in the list does not respond, the second one will be requested and so on. This will slow down the processing but increase its accuracy.
  • E-mail is the section where you can indicate the sender’s attributes which can be used during the emulation of test messages sending (SMTP ID, sender address). Please note that these settings are used only in the deep (slow) mode of high-quality check. See the section “Deep (slow) mode of Bulk Verifier”.
  • Threads is a number of simultaneously available processing threads which define a number of e-mail addresses being checked at the same moment.
  • “No relay” is not error – do not consider the DNS response “No relay” as a sign of domain invalidity.
  • Advanced e-mail check using SMTP – to run Bulk Verifier in the deep (slow) mode of high-quality check. See the section “Deep (slow) mode of Bulk Verifier”.
  • Proxy is the section where you can set your proxy parameters (Server address, Authentification method, Port, Username/Password, etc.) if you use one.

7 Results representation

Bulk Verifier represents processing results in several files placed into the Output folder specified on the application Options:

  • <Input file name>.ses. This file represents the processing state by the moment when it was completed or interrupted by user (with the help of the “Stop” button). Here Bulk Verifier indicates the Input file, Output folder and other parameters including the reference to the address which was checked the last. If you have interrupted the check, you can re-start it later by pressing “Go” and Bulk Verifier will automatically continue from the point where you interrupted the processing.
  • <Input file name>.invalid.domains-and-emails.txt. This file stores the list of lines from Input file where both e-mail address and domain are invalid.
  • <Input file name>.invalid.emails.txt. This file stores the list of lines from Input file where e-mail address is invalid.
  • <Input file name>.invalid.domains.txt. This file stores the list of lines from Input file where domain is invalid.
  • <Input file name>.timeout.domains-and-emails.txt. This file stores the list of lines from Input file where both e-mail address and domain could not be checked due to the connection timeout.
  • <Input file name>.timeout.emails.txt. This file stores the list of lines from Input file where e-mail address could not be checked due to the connection timeout.
  • <Input file name>.timeout.domains.txt. This file stores the list of lines from Input file where domain could not be checked due to the connection timeout.
  • <Input file name>.valid.domains-and-emails.txt. This file stores the list of lines from Input file where both e-mail address and domain are valid.
  • <Input file name>.valid.emails.txt. This file stores the list of lines from Input file where e-mail address is valid.
  • <Input file name>.valid.domains.txt. This file stores the list of lines from Input file where domain is valid.

For example, if the input file name was “Master.txt”, after the processing you may get:

Master.ses
Master.invalid.domains.txt
Master.valid.domains-and-emails.txt
Etc.