Why algoseek

Comprehensive Market Data Products for Machine Learning

algoseek’s deep understanding of quantitative and algorithmic trading has led to the creation of the most comprehensive and detailed market data products in the financial data industry.

Take equity data as an example. We have four equity minute bar datasets: TAQ (Trade and Quote) minute bar and Trade Only minute bar based on full SIP feed, and TAQ bar and Trade Only bar that exclude off-exchange trades (none TRF|FINRA version), with the none-TRF versions being used for backtesting trades in public exchanges. We also have five EOD datasets: standard daily OHLC, Primary Exchange Daily OHLC, standard adjusted daily OHLC, Primary Exchange daily OC price, and standard daily OC price. Among them, the Primary Exchange OHLC and OC provide historical official opening and closing prices from the equity’s primary exchange are widely used in comprehensive strategy backtesting and portfolio management. No need to mention that our equity TAQ Bar product has 61 data points and provides continuous bar time (so your programmers don’t need to carry forward previous events). No other data vendors have ever developed such detailed, considerate market data products to take care of a wide variety of data needs.

The following table provides a partial summary of our differentiation. Please contact algoseek sales if you want to know more.

  • Other Vendors

  • Data Fields in Minute Bars

    50 – 60

    10 – 20

  • Aggressor Flag & Aggressor Count (for Futures data)

    Yes

    No

  • Continuous Bar Every Minute
    (always show open/close bid/ask)

    Yes

    No

  • Choices of Exchanges
    (for all aggregated Equity data)

    Both All Exchanges and TRF & Listed Exchanges Only

    Only All Exchanges and TRF

  • TAQ & TANQ (for Options data)

    Both TAQ (Trades & Quotes) & TANQ (Trades & NBBO Quotes)

    Only TANQ

Table 1: Major differences between algoseek data products and Other Vendors

“As-Is” Data Collected by Co-Lo Server and Proprietary Ticker Plant

Data source matters.

Restrained by costs and lacking technological ability, historical market data vendors often obtain data from exchange archives and/or third-party ticker plants.

Real-time data is imperfect, subject to exchange publishing mistakes such as out-of-sequence packets. Exchange archived data has been “corrected” and cannot be used to exactly replicate the live data. Using corrected data for backtesting causes unrealistic results and risks your algorithm breaking in the real-time scenarios.

When obtaining data from a third-party ticker plant, the data vendor has no control over how data is recorded. When data issues occur (such as missing events), the data vendor needs to ask the ticker plant owner to investigate.

Figure 1: algoseek Co-Lo and Cloud-based Infrastructure

At algoseek, we collect data with our Equinix co-lo servers and proprietary ticker plant, and we promise to provide:

  • the best “as-is” data in the market
  • SLA & vertically integrated problem-solving capability
  • custom ability without ticker plant limitation

Data Customization and Management Services

Our highly efficient engineer team, cloud-based infrastructure, and advanced parallel processing architecture provide us the flexibility to customize data products for our clients.

The following chart is our data customization workflow, taking customization of the equities minute bar as an example of processing workload.

Figure 2: algoseek Data Customization Workflow (Taking Custom Equity Minute Bar Fields as an Example)

From 2020, algoseek started to provide data management services to help our clients with data validating, wrangling, normalizing and processing, and delivering the data clients need in the format they need. So our clients can focus on their core business and leave data to algoseek.

Data management service is still in beta version. Please query algoseek sales for details.

Capacity & Flexibility Ensured by Cloud-based Infrastructure

algoseek’s capability to satisfy long-tail data needs is based on its advanced infrastructure. From ticker plant to data processing, algoseek’s state-of-the-art software was designed on cloud-based storage and parallel processing from day one.

Built on Amazon Web Services, algoseek can achieve unlimited computer scaling by adding EC2 instances and AWS S3 storage space, with elastic computing power and no download bottlenecks.

It means that we can quickly re-processing our data to satisfy new data needs, and provide fast data access to the most demanding clients. For subscription users, you have our delivery assurance in whatever challenging market situation.

Since inception, we’ve been through data volume explosion a couple of times, with the most recent one being the market correction triggered by the coronavirus. Equity trades and quotes volume grew four times, futures volume surged five times, and options volume doubled. Under the close inspection of our engineering team, our hardware and software managed this data surge and consistently provided quality daily updates to our most demanding live trading clients.

Quality Assurance and Client Support

algoseek quality control software monitors the data processing processes and examines individual data files. We also created our own chaos monkey and latency monkey to improve the availability and reliability of our system.

We understand getting accurate data on time is crucial for our clients’ success. Apart from our automation efforts, we also established data inspection protocols and our quality team manually examines data multiple times a day, thus providing double insurance to clients.

algoseek support team provides our clients with technical support services, including documentation, AWS account setup, troubleshooting, and other service requests. Clients can request supports by calling algoseek support number, filling in a request form on our website, or sending us an email request, and an algoseek support engineer will quickly get back to you.