Sometimes you come across a tool that everyone but you seems to have known about. I hit a wall recently where I wanted to query a massive 10GB text file with a list of terms in another file.
Usually a simple grep command would do the trick, but I quickly learned the limitations of grep, when I let the command run overnight and came back in the morning to my system still churning away.
Grep in all of its utility has been a powerful tool in the arsenal of many an IT professional, or anyone using shell for that matter. Grep was created before I was born in 1973 by Ken Thompson as an offshoot of the “ed” regular expression parser. It is such an integral tool that it is part of pretty much every Unix based system.
Being an older utility it is a little stuck in its ways about how it goes about doing work and although it gets the job done, it is not particularly efficient. Grep like many command line tools they are not designed to take advantage of processors with multiple cores, back in the day it only had one core, that’s the way it was and we liked it!
Enter GNU Parallel, a shell tool designed for executing tasks in parallel using one or more computers. For my purposes I just ran in on a single system, but wanted to take advantage of multiple cores.
Having enough memory on my system, I loaded the entire massive file into memory and pipe it to GNU Parallel along with another file consisting of thousands of different strings I want to search for in the “PATTERNFILE”:
cat BIGFILE | parallel –pipe grep -f PATTERNFILE
A process that would have taken almost a day ran in under a a few hours. Almost immediately after I fired the command the fan in my laptop kicked into overdrive, a good sign that it was being put to work.
To really leverage the power of the tool you can farm processes out to multiple systems, but for now I am just happy to be able to run shell commands using multiple cores.
- Unified Security Configuration and Vulnerability Management
- Heart Attack: Detecting Heartbleed Exploits in Real-Time
- How to Detect the Heartbleed OpenSSL Vulnerability in Your Environment
- NETGEAR Wireless Router Configuration Guide
Check out Tripwire SecureScan™, a free, cloud-based vulnerability management service for up to 100 Internet Protocol (IP) addresses on internal networks. This new tool makes vulnerability management easily accessible to small and medium-sized businesses that may not have the resources for enterprise-grade security technology – and it detects the Heartbleed vulnerability.
The Executive’s Guide to the Top 20 Critical Security Controls
Tripwire has compiled an e-book, titled The Executive’s Guide to the Top 20 Critical Security Controls: Key Takeaways and Improvement Opportunities, which is available for download [registration form required].
Title image courtesy of ShutterStock