Echo Archive 🔍

Available Data

This site contains Twitter JSONL and media related to:

This data is provided "as is" for researchers, journalists, historians, and data hoarders.

File Access

Files are located here: https://echoarchive.org/files/

Download Directories

wget -m -np -c -U FAQ -R "index.html*" "https://echoarchive.org/files/"

Linux Search Examples

Search for a term in a specific JSONL.ZST file:

find ./ -type f -name "{{ file_name }}.zst" -exec zstd --long=28 -d -c {} \; | grep "{{ search_term }}"  | less

Search an entire directory:

find ./ -type f -name "*.zst" -exec zstd --long=28 -d -c {} \; | grep "{{ search_term }}" | less

Note: You may need to increase the --long value based on the file size.

Directory Structure

    twitter/
    ├── COVID_Tweets_2020_01-05
    │   ├── 2020-01
    │   ├── 2020-02
    │   ├── 2020-03
    │   ├── 2020-04
    │   └── 2020-05
    ├── history
    ├── ukraine
    │   ├── images
    │   │   └── urls
    │   ├── jsonl
    │   ├── users
    │   └── video
    │       ├── contact_sheets
    │       ├── sheet_videos
    │       └── urls
    └── various
    

Source

Mirror of Tweet JL: The largest collection of tweets available in JSONL format.

Original Source: The Eye - Twitter Archive

Contact

There are probably better ways to iterate and search through the data. I am not a data analyst. If you want to contact me, please email: vid.archive9@gmail.com