legal contact rss

Pastebin gatherings


Search the Pastebin for interesting stuff by using some Yara rules to map for keywords and expressions. After that, display your getherings on a nice splunk view for making your findings more convenient.

This is still work in progres, but could be done so far is pretty straight forward...


Get yourself a pro account from pastebin to be able to scrath pastebin. And eventually an account at GitHub to retrieve gists as well.

Find your Pastebin API here:

And whitelist your office IP at Pastbin. -> here

Install PasteHunter (and give respect to TheHermit)

Do a "git clone" to receive the actual programm doing the work.
Install the missing stuff with "pip3 install -r requirements.txt"

Adjusting the config

at settings.json adjust the appropriate fields to your own data:

"inputs": {
"enabled": true,
"module": "inputs.pastebin",
"api_scrape": "",
"api_raw":[your pastbin API key],
"paste_limit": 200,
"store_all": false
"dumpz": {
"enabled": false,
"module": "inputs.dumpz",
"api_scrape": "",
"api_raw": "",
"paste_limit": 200,
"store_all": false
"gists": {
"enabled": true,
"module": "inputs.gists",
"api_token": "[your Git oAuth key]",
"api_limit": 100,
"store_all": false,
"user_blacklist": [],
"file_blacklist": ["grahamcofborg-eval-package-list"]

Ensure you've enable the JSON output in the config:

"json_output": {
"enabled": true,
"module": "outputs.json_output",
"classname": "JsonOutput",
"output_path": "logs/json/",
"store_raw": true,
"encode_raw": true

P.S. As per the latest issue, check the comma at the [general] section is set after the "run_frequency" line.

Every paste gathered is checked against the yara rule logic that is stored in the corresponding Yara directory. - Check it out and write your own rules.

Depending on the installation directory, give it a try

/usr/bin/python3 /opt/pastehunter/

The output shoud look something like: Configs Inputs Input: gists Input: pastebin Outputs Output: elastic_output Output: syslog_output Output: csv_output Output: json_output Yara Rules Blacklist Rules Queue paste list from inputs.gists Limit: 4992. Resets at 2018-02-03T11:38:29 paste list from inputs.pastebin for 300 Seconds

Check if the corresponding json file has some entrys at (depending on your install directory) at: /opt/pastehunter/logs/json/

Configure Splunk

Install a SPlunkForwarder on your machine and get the data into your splunk.

The inputs.conf that works at my end is the below one.

disabled = false
renderXml = true
index = main
sourcetype = Pastebin

Create the right sourcetype at you splunk server.


KV_MODE = true
category = Structured
description = JavaScript Object Notation format. For more information, visit
disabled = false
pulldown_type = true

Job done ...

Now start everything and watch the sourcetype=Pastebin in Splunk.

I've created me a view with some searches that could act as an excample for you

sourcetype=pastebin | stats distinct_count(raw_paste) as pastes


sourcetype=pastebin "YaraRule{}"=* | rename "YaraRule{}" as Topics | stats count by Topics

sourcetype=pastebin "YaraRule{}"=* raw_paste=* | fillnull value="n.a." | rename "YaraRule{}" AS topic | eval result=raw_paste."--> ".scrape_url." <--"| stats count by topic title type syntax result


Already found an interesting paste of the quite personal data of a leak.
Check out yourself for cool stuff and if you find stuff being written about you, your family and your company. I'm doing this simply by adding a regex part within the search. (| regex _raw="|xxx|xxx|xxx|xxx|xxx" |)

email_list Saudi Arab govt ambassadors data leaked By Touseef Jaskani n.a. text Saudi Goverment Offical Ambassadors Personal Database Leaked by Touseef Jaskani Officals leaks Get Official leak on name passport bir_date sex s_cell email sup_email منير حميد احمد الحمادي Muneer Hamid Ahmed Al-Hammadi 0003326851 15/6/1979 m 0597156808 شاجع على احمد غالب shagae ali ahmed ghaleb 0004602950 2-5-1985 m 0590073931 جامعة الملك سعود - سكن الطلاب تعز - اليمن حسين سالم علي الحريبي Hussein Salem Ali AL-Huraibi 002286644 13/11/1984 m 0538254705 ابرق الرغامة - جدة شبوة - بيحان حلمي محمد محمد صلاح Helmi Mohammed Mohammed Salah 0003921534 30/11/1992 m 0533899304 جامعة الملك سعود السكن الجامعي حضرموت -المكلا فهد عبد القادر عبد الله الهتار fahd abdulqader abdallah alhetar 002814670 01/01/1977 m 0591465281 الرياض - جامعة الملك سعود إب - الظهار إبراهيم محمد محمد محرم Ebrahim Mohammed mohammed moharrm 003386386 01/01/1985 m 0548046695 جامعة الملك سعود صنعاء - مديرية السبعين - شارع بينون توفيق عبدة صالح عوض Taufiq Abdh Saleh 00804321 5/9/1972 m 00966506065323 جامعة الملك فهد الدمام المنطقة الشرقية الحديدة - الحي التجاري بشائر عبدالله حسن حسين bashayr abdullah hassan hussain 01 29/7/1991 f 0544402724 GHROO00OOR.ONTHA@HOTMAIL.COM جدة مشاعل محمد عبدالرحمن العمودي mashael mohammed abdulrahman al amoudi 00347960 مضافه 1/ 7 /1415 هـــ f 0553517277__ 05608 مكه المكرمه __ العزيزيه الجنوبيه لايوجد محمد محمد قائد محمد Mohammad Mohammad Qaid Mohammad 002500623 30/05/1982 m 0595793575 السكن الجامعي - جامعة الملك سعود المسراخ-تعز سعاد حمود عوضه Souad Hammoud odah 01340458 1974l f 0507771722 dakd@lkf.ckj عبدالله صالح محمد الجفري Abdullah Saleh Mohammed Algefri 01353823 1414/02/26 m 0590526875 الدمام حي النخيل ريناد سالم علي الكاف renad salem ali alkaf 01333255 25/8/1993 f 0562783833 - المنز جده - حي الزهراء - شارع حلمي كتبي عمر سعيد علي باسالم omar saeed ali basalem 01632842 1995/01/26م m 0532337121 الطايف حي العقيق شارع ا--> <--

supervisor config

To enable the whole thing within your supervisord, I created below file:


command=/usr/bin/python3 /opt/pastehunter/
process_name = pasthunter%(process_num)d
startsecs = 20
autostart = true
autoretstart = true
user = root