PDA

View Full Version : Misc NFT - Anybody here provide (or know of ) data scrubbing services? Referral $$$ paid


Trivers
10-17-2014, 03:02 PM
Would prefer to spend $$$ with fellow CPers.

We have 43 databases averaging 50K each that need to be cleaned up.

Seeking consultant.

Referral $$$ paid to your paypal account

Thanks

58-4ever
10-17-2014, 03:08 PM
Let me know what kind of services you need. Are you in the KC Metro?

stumppy
10-17-2014, 03:08 PM
I got a gal that comes by when I call and cleans my pipes. She's pretty talented, I'm sure she could clean your databases too.......while wearing a french maid outfit.

58-4ever
10-17-2014, 03:14 PM
Shoot me a PM

stonedstooge
10-17-2014, 03:15 PM
Can we keep your porn?

Rain Man
10-17-2014, 04:13 PM
Cleaned up how?

CaliforniaChief
10-17-2014, 04:15 PM
http://www.troll.me/images/jesse-pinkman/magnets-bitch.jpg

TLO
10-17-2014, 04:17 PM
This thread had potential.... but alas.

srvy
10-17-2014, 05:09 PM
Better call Saul.

srvy
10-17-2014, 05:12 PM
If its really bad The Wolf.

<iframe width="560" height="315" src="//www.youtube.com/embed/IgzFPOMjiC8" frameborder="0" allowfullscreen></iframe>

AustinChief
10-17-2014, 05:14 PM
Gonna need more details. Are you looking at something that could be automated or something that requires a manual once-over?

Trivers
10-24-2014, 03:01 PM
Thank you for your responses.

Here are the details:

We are sending emails to list of insurance agents in all fifty states.

The databases are public records.

The first list we need cleaned up is from WisCONsin. (How the natives pronounce it.)

http://oci.wi.gov/agentlic/agntlist.shtml

Scroll down to bottom of page:

Format 2 - Agents by Company Appointments

There are 17 files. First one is al_1st-al.exe. We figured out how to separate columns, and remove the records without email addresses.

1) We need all the first and last name duplicates removed; and 2) words turned into lower cases.

We would prefer a way to automate this whole process.

If interested, please PM.

Thanks

unlurking
10-24-2014, 04:20 PM
*1) uniq file > newfile
2) tr '[:upper:]' '[:lower:]' < file > newfile

*(Are you sure you want to remove dupes by name and not just the entire line? If there are two John Smith entries with different info you will lose one. Might be better to remove complete dupe lines instead?)



</pre>

Trivers
10-24-2014, 04:34 PM
*1) uniq file > newfile
2) tr '[:upper:]' '[:lower:]' < file > newfile

*(Are you sure you want to remove dupes by name and not just the entire line? If there are two John Smith entries with different info you will lose one. Might be better to remove complete dupe lines instead?)



</pre>


Good catch! Yes.

Thank you!

unlurking
10-24-2014, 04:51 PM
If you have a bash shell, this will work...


cat infile | cut -c101-135,136-160,350-420 | tr '[:upper:]' '[:lower:]' | sed 's/ \s\+/,/g' > outfile
uniq infile outfile
EDIT: This was tested using the "al_1st-al.exe" file from the site you linked to. Only 8 dupe lines are dropped.

ChiefRocka
10-24-2014, 04:56 PM
fdisk

Simply Red
10-24-2014, 05:08 PM
Speak with Stryker - & I'd like to apologize in advance if you've already found a solution.

But Stryker will (at a minimum) be able to guide you to someone. I just don't recall for sure if he's still in KC Metro.

Good luck to you!

Dave Lane
10-24-2014, 05:51 PM
http://www.midlandhardware.com/thumbnail.asp?file=assets/images/products/730393.jpg&maxx=350&maxy=350

This should scrub it for you. PM me for paypal fee,



:)

Trivers
10-28-2014, 10:42 AM
*1) uniq file > newfile
2) tr '[:upper:]' '[:lower:]' < file > newfile

*(Are you sure you want to remove dupes by name and not just the entire line? If there are two John Smith entries with different info you will lose one. Might be better to remove complete dupe lines instead?)



</pre>

Check your PM. Trying to reach you.

ptlyon
10-28-2014, 12:29 PM
Magnet

Window Licking Whiner
10-28-2014, 12:49 PM
Couldn't you just use awk? Instead of cut? awk '{print $(field#)}' | then do the translation from upper to lower