Sarit's Blog: September 2011

Saturday, September 24, 2011

Garage Sale Handplanes #2: PSU Mod for Electrolysis

This is Part 2 of my previous post Garage Sale Handplanes

I have tried using Naval Jelly for rust removal and never liked the staining that occurs. If you miss a spot during application, you'll clearly see the outline of the stain. Successive coats won't hide it. The only way to remove them is to sand away the stain. A friend of mine had great success using electrolysis to remove rust from an old gun so I thought I would give that a try.

Electrolysis rust removal is essentially the process of using electricity and an electrolyte to move the rust from the part being cleaned to a sacrificial electrode. There's a bit more going on there, but you get the picture.

The first step then, is to provide the electricity. Pretty much any DC current will do so long as you have some sort of protection for short circuits. Most people use a 12v car battery charger/starter since they are cheap and most people have one. It should be able to charge at rate of a couple amps. The trickle chargers won't work. I unfortunately don't have one, but I do have an old computer power supply. A PC power supply (PSU) is basically a regulated AC to DC converter that provides multiple voltages ("rails"). The only difficulty is that turning on a PSU requires you to jump a wire (short 2 wires together) and ensure there is a load on the 5volt rail.

On any PSU, there should be a nameplate which gives you the number of amps each rail can produce. Here you can see the +12V rail can produce a good 18 amps! That will be more than enough for our needs

WARNING! DO NOT ATTEMPT THIS AT HOME. A PSU HAS HIGH VOLTAGE CAPACITORS THAT CAN KILL YOU.

My goals:

Add a switch so I don't have to jump wires to start this thing
Add binding posts to make connecting wires easier
Add a fuse to protect the PSU in case I cause a short circuit
Add resistors to stabilize the voltage

Here is a view w/ the cover removed. As you can see there's not much room for me to add anything.

One thing to note, is that based on what I saw, the PSU is laid out so that the high voltage AC work is done on one side (top in the pic above) and the lower voltage DC work is on the other. Since I'm adding wires and components for DC, I'll want to stay towards the DC side as much as possible.

I don't need no stinking wires! Well, actually I do. These wires are color coded. The green wire will power on the PSU if I connect it to one of the black wires. The yellow, red, and orange wires represent the +12V, +5V, and +3.3V rails respectively. The black wires are all ground. I removed all the wires except the yellow, black, green, one red, and a pair of wires for an extra fan. Of the colors I did keep, I still had too many wires so I thinned them down so I would have more room in the case.

To stabilize the voltages, I needed to put a load on the +5V rail (red wires). I used 2 20Ohm 5 watt resistors wired in parallel. This gives me effectively a Since they would get hot dissipating that power, I zip-tied them to the case grill.

Here you can see the binding posts for banana plugs and the switch installed on the case. The biggest pain was making a 1/2" hole for the switch. A stepped drill bit (the kind that looks like a cone) is what you want. I tried using a dewalt twist drill bit with a smaller "starter" bit at the tip. That failed miserably because the transition between the larger part of the bit would catch on the metal, bend it up and jam the whole kaboodle. The next major pain was my fault as you have to be aware of which nuts/sleeves need to be on which part of the wire before you solder. I ended up soldering the binding posts 3 times because of this.

Here's a look from the inside. Notice I used shrink wrap tubing to minimized the exposed metal. Hopefully nothing will short out.

Here you can see that I added battery clamps.Again remember to thread the clamp handles onto the wire before attaching them otherwise you wont be able to get them back on.

Here is the 15 amp inline fuse. The PSU probably has some sort of protection also, but these are cheap to install and fuses are cheap to replace.

Friday, September 23, 2011

Taking down my tree

We're taking down an old deodar cedar tree. I like trees, but whoever planted this tree simply chose a poor location for it and now its forming a crack in my foundation. I found a tree service company who was willing to keep the trunk in 10-12 ft sections and haul them to a local mill to be processed into usable lumber. I'm hoping I can make some awesome outdoor projects with the wood. I'll still have to seal the ends of the trunk and air dry the wood before I can use it.

Monday, September 19, 2011

Graph Databases + Hadoop?

When one thinks of BigData processing, many people immediately think of Hadoop. And if you want to put a database on Hadoop, you think of HBase. However, with the glut of loosely structured data coming from all over the web, the most useful type of database is a Graph Database. Unfortunately, there is currently no graph DB on hadoop and I'm not sure if there ever will be one.

What is a Graph?

It is merely a way of structuring data using vertices (or nodes) and edges. Data is stored both at the vertices and on the edges themselves. Edges can be bi or uni directional. The flexibility comes from the fact that any edge and point to any node and many times, the information that we seek is related to the configuration of the graph, not just the data in the edges and nodes. In the graph above, we can see that Alice and Bob are both members of the Chess group and that they both know each other.

In the example, we use nouns as nodes and the verbs: knows, and is member as edges. This is a very common way of modeling natural language statements as graphs. Let's take a look at some fake tweets that I have contrived for an example:

@NikeFan - I love my Nike's. They are the best shoes.
@ShoeShopper - I just bought some Nikes, some Reeboks, and a Swiss Air.
@ReebokFan - I just hate Nike, their new lineup is horrible.

After some text analysis we could derive this graph:

As you can see, the edge "is" can be applied to either Shoe or Twitter users. This makes the graph easily extended as our knowledge increases. You can also see that if I want to find twitter users who have bought shoes, I would look for any node that "is" a shoe and see what nodes "bought" it while ensuring that node "is" also a Twitter User. This is where the graph database comes in. These types of databases are optimized for searching and traversing over relationships to resolve your queries. A traditional RDBMS could represent a graph as just a few tables, however resolving a graph query would take ages as each step in the search would involve scanning the same set of tables over and over. Then as your graph grows, your query times grow linearly or worse. With a graphing DB, there are many opportunities to limit where we search and enable the search to be done in parallel.

The Holy Grail - Graphing Databases on Hadoop

You would think that with all the power Hadoop provides, there should be at least one graph db on it, but at this time there are none. I think there are valid technical reasons for it. For one, Hadoop has a high latency. Even a one second response time would be unreasonable for simple queries and I doubt any hadoop job could be resolved in less than one sec. Part of the reason is that the inputs and outputs of a map reduce job are files and the underlying hadoop file system (HDFS) will log, replicate, and checkpoint the data to hard disk as the job runs. If we had a sort of Hadoop Lite system where we keep our partial data results only in memory, then we could reduce some of the latency. If a node fails, then we simply restart the computation from the point where we have data on disk, maybe even distributing that across the nodes that have already finished. The second issue is hadoop doesn't like the data elements that you process to be dependent on each other. With a graph query, that is very hard to do. Even if you could partition each node with disconnected graphs, they can still be easily connected in the future necessitating a costly reshuffle of the data. Worse yet, is that in time, graphs tend to be more connected, so eventually you'll have only one partition, not a very scalable option. The only solution I can come up is to allow the graphs to traverse nodes and when a query hits a border between machines, issue a new query that is fed back into another map reduce round. This continues until a solution is found. This means that if you have a solution that traverses through multiple machines, you would have to run multiple rounds of map reduce making the query super slow.

In the end, I expect we'll have to settle for a different platform other than hadoop to support graphing databases. We could still use hadoop to do inserts or query into a graph database, but as far as running the db on hadoop, that's not gonna happen in the foreseeable future.

Friday, September 16, 2011

Hadoop for Everyone

Apache Hadoop is an open source platform for running really huge jobs on a whole cluster of machines. Some of the most interesting problems in the world can only be practically solved using the power that hadoop can harness. The biggest problem for developers like me is that we don't have the time, space, or money to install a cluster of hundreds or even thousands of machines. Even if I did, I would be a big pain to maintain all those machines. And even though I work for a big company with all the resources to make a hadoop cluster, asking for 100 machines so I can "just try something" is never gonna fly.

One of the major pain-points of hadoop, is the fact that not all machines are treated the same. Some machines have to function in roles like NameNodes which require large amounts of RAM and some level of high availability/redundancy. On the other hand, the vast majority of machines can function as workers and can be simple commodity machines for cost savings. This means any efficient hadoop cluster is going to be a heterogeneous environment which further increases maintenance costs.

So what can we do? What we need is an on-demand hadoop cluster that you can pay for what you use. Amazon's EC2 and S3 have typically been used to provide metered webservices and data storage. However, deploying a hadoop cluster with heterogeneous server instances on a remote cloud still requires you to do a lot of the setup. Amazon realized this and created their Elastic MapReduce service. Now you can run your hadoop jobs on-demand with very little configuration and yet still have complete control over what class of machine you want to assign to the different roles in your cluster.

This makes a lot of sense for software developers like me. During development, I may use a small cluster to prove that my stuff is working. During QA, we might vary the size/configuration to see how our solution scales and do performance tuning. Our marketing and sales teams can have demos ready on the cloud anytime, anywhere in the world to showcase prospective customers. We would also have a good solution for customers who do not want a huge IT spending outlay before they are convinced of the value of our product. This might even absolve us of any legal/privacy concerns by letting the customer make their own agreements with Amazon whereas we provide just the software.

Wednesday, September 14, 2011

Dear Sophie, sorry it's just too difficult

If you haven't seen the Google Chrome ad "Dear Sophie" you should see it below. Being a dad, especially one with a daughter named Sophie, you can't help but feel the need to recreate this experience for your own children. Unfortunately, I found that trying to replicate this was much less heart-warming than the ad would suggest. For starters, Google doesn't allow you to create gmail accounts for other users. If you enter a child's birthday and they are less than 13 it will stop you also. Next suppose you forget to login to the gmail account for a few months, guess what, it just might be deleted. My issue was that since I never had a gmail account attached to my google account (yes they are separate things) and as soon as I tried to create a gmail address for my daughter, that address becomes permanently attached to my google account. Even if you delete the gmail account, I would never be able to assign my gmail account to it. Furthermore, my daughter would never be able to have her email address as it is permanently taken by my account. Of course I didn't want to simply give her my account which has access to all the other google services like this blog/adsense/analytics/picassa/youtube. My only recourse was to create a new google account for myself, transfer every google asset I have to that account, delete my old account, and try to recover it from google (supposedly recovering an account only recovers the username, my daughters new email address, but everything else is wiped clean). Now google is refusing to recover the account. Apparently, they are not sure if I am the owner of the account, despite the fact that I'm the one who deleted it. So now I'm stuck. My daughter won't have her email address. All this because google can't bring themselves to change gmail accounts on their google account.

I say screw it all and just create a private blog by limiting the permission of the readers. This eliminates the need to keep logging into your children's email accounts to keep it around. Since its private you can share it with them and only them just as any email account.

Saturday, September 10, 2011

Garage Sale Hand Planes

I got these on my last outing. I had just about given up hope when I saw one more yard sale sign on my way home. I managed to negotiate them down to $60. Its a Stanley #7C and #5C. The tote is broken on the 5, and the lip on the 7 also looks broken off. I'll do a blog on restoring these once I can get some of my other projects squared away.