Mozilla Global Sprint

I am helping SUNY-Albany-CCI organize a team for the first ever Mozilla Science Lab global sprint.

When: Tuesday, July 22, 2014 9:00 AM – 5:00 PM, Wednesday, July 23, 2014 9:00 AM – 5:00 PM

Where: SUNY, Albany, main campus, New Business Building; Room BB 213

http://mozillascience.org/how-to-join-us-for-our-first-ever-global-sprint/

Why:

1.  Contribute to open science.

2. Learn skills that can be used to document your own software.

3. Continue work done at previous hackathons (http://www.osehra.org/blog/national-day-civic-hacking-2014-report).

More Info (google doc):

https://docs.google.com/document/d/1D-x7Z_ZiK2NqmfztuHyP1u2r_nHzyS-asDu99swbE94/edit#

 

 

 

Hacking trails and health data

The last two Saturdays I participated in “Hack for Change Albany” and “National Trails Day”.  Most people would consider trail work and computer programming to be very different, but both events were actually very similar in format.  Groups of volunteers got together for a common cause.  In each case we were loosely organized by a leader but also encouraged to self-manage.   We contributed the skills we had and learned new skills from others.  The week following “Hack for Change”, the FDA released OpenFDA.  Using the IPython notebooks we made at “Hack for Change”, as a reference I was able to explore this data a bit.   Pictures courtesy of Luis Ibanez and Jona Lee from her ZeeLemons blog.

Hacking health data…

IMG_20140531_154454 IMG_20140531_154021

Hacking the forest…

IMG_1821 IMG_1831

Health data before and after…

before after

The trail before and after…

photo-17-1024x1024

Link Dump

Some interesting articles from the last couple of weeks…(If you follow the open science and reproducible research communities you have probably already seen these).

Nate Silver comments on a repeatable data controversy.   Silver reaches a very large audience so people outside the open data community will read this.  He also makes a good point “be skeptical of the skeptics” too!

C. Titus Brown makes some great points on reproducible computational research.  “20% is better then 0”, “Help some scientists use existing software”,  “By the time it’s in a GUI, it’s rarely still very cutting edge.”

FDA opens up adverse event data

A nice presentation on Python for Biology aimed at novices.  Slide number 17 “use mac or linux”.  That is an issue I’ve been thinking about lately.   Keep an up-to-date Windows partition or not??  So many people still use windows.  Am I better off staying compatible with them or convincing them to change??  Not everyone will convert unfortunately.

 

 

 

 

ISBI 2014 Grand Challenge

Last month I attended ISBI 2014 in Beijing as part of the Deconvolution Grand Challenge.  These challenges are very important as they encourage cross fertilization of technology.

The top 3 entries were.

Ferreol Soulez – Learn 2D, Apply 3D Method

David Biggs – Olympus Cellsens Deconvolution Software- (Very fast.  Dave did his PhD in algorithm acceleration).

Sander Kromwijk –   A GPU version of Hessian Schatten Norm Regularization.  Also very fast.

I submitted an entry using Total Variation based on Nicolas Dey’s implementation  written using the Image2 platform.  Theoretically it should been competitive with but slightly below the Hessian Schatten Norm approach (as reported in the Lefkimmiatis’ paper).  Though that assumes a near optimal choice for number of iterations and regularization parameter.  I used 1000 and 0.0005.  1000 is a lot of iterations, but it is a non-accelerated method and could require even more.  The Total Variation method can (and should) be accelerated for example here and here.

In the end they did not reveal the scores for the algorithms (other then the ranking of the top 3).   I had mixed feelings about this.  On one hand I understand that the organizers did not want to discourage new participants.  On the other hand results are often greatly influenced by parameter tuning, so revealing the results could help the participants optimize the parameters as to run their algorithms under the best conditions.

Anyway I would like to thank the organizers Cedrich Vonesch and Stamatios Lefkimmiatis for running the contest.  It was a lot of fun to participate in.

Also while in China I had the chance to see a few things.  This picture is from a very cool hike I did along the great wall (organized by Beijing Hikers).  There was actually a marathon run on the Great Wall while I was in China.  The organizers were staying at our hotel, I told them I was a runner and they tried to make me sign up… thankfully I resisted.

IMG_1730

ImageJ2-GPU Deconvolution

I’ve just added an ImageJ2 wrapper to YacuDecu (by Bob Pepin) in both the main deconvolution project and RogueImageJPlugins.   RogueImageJPlugins is a place where I am putting cool self contained plugins and examples.   Yacu Decu was written as an Imaris plugin.  The Imaris version can be found here.

I’ve played with it a little and got about a 10x speed improvement for power of 2 over a CPU implementation (using Mines JTK).   More detailed testing results to come.

Currently I have not made a distributable version.  So as of now it exists as a “fun example for developers”.  I need to think through the best way to package a cuda plugin.   To get the plugin working in a development environment

  1. Use CMake to create build files for SimpleYacuDecuNative.  Then copy the library to a location where ImageJ will find it.
  2. Use Maven and/or Exlipse to run SimpleYacuDecuWrapper.  There is a “main” that launches imagej in the Utilities class.
  3. Under “Plugins/Deconvolution” the Yacu Decu option appears.  You will need to have a PSF and image (both the same size) open.
  4. As with most of the GPU implementations it is fastest when using images with all dimensions equal to power of 2.

Definitely more to come.  Great work by Mr. Pepin and to Bitplane for supporting this within Imaris.  YacuDecu is the only open and distributable GPU deconvolution program I am aware of.  (Note: There is a very advanced plugin that can use GPU for multiview deconvolution, I am in the process of trying to figure how to use it for (the simpler case) of single view deconvolution.)

 

Heart Bleed, Richardson Lucy and Anti-Fragility

I am no expert in internet security.  I am also fairly new to open source so I don’t claim to understand all the complicated factors that led to the heartbleed bug.  From what I understand the problem occurred because of a bug in a piece of widely used open source code.  There is quite a bit of discussion on heartbleed and issues of open versus closed.  But to me another issue is centralized versus decentralized, or fragile versus anti-fragile.   There is a good article by Christina Warren on the “open versus closed” issues  here.   She mentions that large, popular open source projects are funded 3 ways.

  1. Donations from individuals, volunteers (by time or coding abilities) and non-profits.
  2. The project is funded and steered by a commercial entity or entities.
  3. Corporations who use and benefit from the project hire employees who are dedicated to working on the project full-time.

Note her term “the project”.   Perhaps there should also be some emphasis on “their project” as well as “the project”.   Corporations could borrow from “the project” and give back to “the project”, but at the same time be responsible for “their project”.  Better yet corporations could borrow from “multiple projects” and give back to “multiple projects” to efficiently build “their project” (in this case a customized, decentralized and decoupled security solution).  In turn, “their project” becomes another source that can be borrowed from and contributed to.  The system becomes decentralized.

Like I said, I don’t know much about internet security.  I know more about signal processing.  Lately I’ve been tinkering with an algorithm called Richardson-Lucy which is an older and fairly standard deconvolution algorithm.  I’ve tested several implementations of this algorithm and wrote one of my own.   Why did I reinvent the wheel?   There are many reasons including the need to be compatible with another system I use and the eventual need to target different computing architectures (such as GPU).  It then becomes important to test different implementations against each other as to make sure they behave the same numerically.   My own implementation is not a complete re-invention.  I studied, analyzed, built, and ran the other implementations.  So mine has inherited it’s DNA from the others.

I wonder how so many systems wound up using the exact same “heartbeat” code.   Instead of getting a piece of code from one centralized code base shouldn’t people get multiple implementations from decentralized sources?  Currently it seems people think in terms of “re-using” something like openSSL.  Beyond just re-using code, we should also focus on borrowing from, improving on, modifying, mutating, comparing and reimplementing code.    Reuse is efficient, but we want systems that are anti-fragile: Philosopher/Statistician Nicholas Taleb says

Modern societies: efficiency demands are pushing the structures to the maximum, so a little sand in the cogs make the whole edifice totter.

We demand efficiency so we all use the same code but perhaps there is too much emphasis on efficiency.   Robust systems (such as in nature) have redundancy and variability.  These properties should also be desirable in software ecosystems.