Drilling Down on Data with Bobby Neelon & John Kalfayan (Collide)

Bobby Neelon and John Kalfayan from Collide break down the messy reality of getting data ready for RAG, why PDFs are dumpster fires for unstructured data, how extraction changes depending on whether you're dealing with drilling surveys or handwritten logs, and why chunking strategy matters more than people think. They walk through embeddings, vector databases, MCP servers for pulling external data without leaking internal info, and why good metadata and folder structure actually make AI deployments way easier. Plus the hard truth that AI isn't a silver bullet for bad data management and the crap-in-crap-out problem is getting worse because now it can hallucinate on top of the crap.

Click here to watch a video of this episode.


Join the conversation shaping the future of energy.
Collide is the community where oil & gas professionals connect, share insights, and solve real-world problems together. No noise. No fluff. Just the discussions that move our industry forward.
Apply today at collide.io

0:00 - Introductions and RAG overview
3:15 - Document identification and classification challenges
8:40 - Extracting data from unstructured PDFs
13:25 - Real world examples of messy data formats
18:50 - OCR paired with vision models for extraction
22:10 - Chunking strategies and when to use each
26:35 - Embeddings and vector databases explained
30:20 - MCP servers and external data integration
35:45 - Getting data AI-ready with metadata and structure
40:30 - Text-to-SQL approaches and database access
44:15 - Handling duplicates and M&A data integration
48:50 - How AI learns context over time
53:40 - Why traditional data management matters more than ever

https://twitter.com/collide_io
https://www.tiktok.com/@collide.io
https://www.facebook.com/collide.io
https://www.instagram.com/collide.io
https://www.youtube.com/@collide_io
https://bsky.app/profile/digitalwildcatters.bsky.social
https://www.linkedin.com/company/collide-digital-wildcatters
Drilling Down on Data with Bobby Neelon & John Kalfayan (Collide)