Skip to content

NODE.JS IN THE KINGDOM OF PERL

 

Node.js is really shining these days and Y! is heavily investing on it with the open source project called cocktail.

From what I understand, what the main ideas behind Node.js are :

  • Using javascript as programming language ( function is the first class citizen )
  • V8 Javascript engine ( Google bought , open-sourced it and is pushing its limits )
  • Event-driven, asynchronous I/O ( be able to run things simultaneously without having context switching penalty )
  • Libraries ( makes JS be able to talk outside of browsers)

Let’s compare to what we have in the kingdom of Perl

  • anonymous subroutine already provides closure in Perl
  • Perl has a very robust interpreter/engine and can be used almost on every platforms
  • AnyEvent provides “provides a uniform interface to various event loops.”
  • CPAN provides much more libraries/modules than Node.js

The comparison pretty explains how we can write node.js like code in Perl

  • Pick up a high efficient event loop implementation with AnyEvent
  • Use name/anonymous subroutines as callback for any event

PBP part2 — Unit test your Perl script

I always heard people saying that our Perl script cannot be unit-tested because it’s not modularized. However, there is a simple way that you can write tests for your Perl script and the magic here is caller.

You used to write script like this

#!/usr/local/bin/perl
use strict;
use warnings;
print "hello world\n";

Now you can write like this

#!/usr/local/bin/perl
use strict;
use warnings;

main() unless caller();
sub main {
  print "hello world\n";
}

Of course, main is just a name of the main function, you can call it any way you want. run() is another pretty good name. After doing this, you can write your unit test in this way

use strict;
use warnings;

use Test::More;
use Test::Output qw( stdout_is );

require 'a.pl';
stdout_is { &main::main() } "hello world\n", "Test STDOUT";
done_testing();

Put the code above to a.t and run the test

perl a.t
ok 1 - Test STDOUT
1..1

So here I showed two techniques

  • How to test subroutines in a Perl script
  • How to verify the output using Test::Output

Issue with “Vim(let):E684: list index out of range: 1”

As you may know, I am a big Vim fan and installed a bunch of plugins. Here is my .vim directory if you want to take a look. I am using github to share among all my machines and this makes my life much easier . I recently ( may not be that recent 🙂 found a problem that I always get error message “Vim(let):E684: list index out of range: 1” when I am trying to open a file from nerdtree. I of course thought this was caused by it and opened a bug here.  It turned out that the bug was caused by another plugin bufexplorer and it went away after upgrade to the latest version.

Lessons learned

1. The root cause is always under the cover by some seems obvious problem

2. Always closely follow up the issue after you open it

3.  Write down what you found that may benefit others

How to get ideas implemented

This is actually an email I sent to my manager to talk about why some of our new ideas never got implemented and how we can improve that.

Some one from other team who is leaving company and sent out an email mentioned “how frequently these new ideas never got implemented”. This also happened sometimes in our team that ideas didn’t draw enough attention because

1. We don’t have enough resource to do this now, let’s revisit it later
2. We are going to have something great in “near” future, let’s hold off
3. Majority of us don’t think it’s a good idea at all
4. We need to follow the “protocol”

So this is normally what had happened next
1. We never revisit some of the ideas and kept them in the bottom of the backlog
2. We keep repeating some tedious manual work and waiting for the great things
3. Some really good ideas vanished because they didn’t work well at the time they were pointed out
4. We keep using some slow tools like SVN and having no chance to take advantage of the latest tools/methods etc.

However, let’s think from a different view. What makes us work so hard even in our own time. Because of the passion to build this B&R pipeline to be the best in the company, or even to be a world-class product.

This made me think about two things I have learned from the book Steve Jobs.

1. Love what you work and work for your love
2. A player only likes to work with A player

So here are some of the suggestions I have for the team and how to implement the ideas. ( though I know it may be just like other vanished ideas )

1. Build an A player team
2. Give time to these A players and let them to prototype new ideas, but timebox it.
3. Never, never held off a good idea because “we are going to have something great happen shortly”.
4. Get feedback from the users continuously and “don’t just throw the tools out of the fence”
5. Always spend time on refactoring our infrastructure, revisiting the ideas, sharing and advocating our ideas to more audiences, inside or outside the company

Let the A player to innovate , to share and to be appreciated.

Perl Best Practices : Part 1

This a serious of Perl Best Practices I wrote for my teammates. Hope you would find it useful. Before starting, I would like to point out that please don’t follow these best practices blindly. It’s very important to understand what you are doing. I will try my best to describe why we want to follow the best practices in some circumstance, but may not be able to cover everything. Please contact me if you have any questions/suggestions/objections for any of these rules.

Make your Perl code easy to read

According to perltidy website, “Perltidy is a Perl script which indents and reformats Perl scripts to make them easier to read. If you write Perl scripts, or spend much time reading them, you will probably find it useful.”. Here are some code snippets that I shamelessly copy from perltidy website because I am too lazy to type my own.

From:
$_= <<'EOL';
$url = new URI::URL "http://www/"; die if $url eq "xXx";
EOL
LOOP:{print(" digits"),redo LOOP if/\G\d+\b[,.;]?\s*/gc;print(" lowercase"),
redo LOOP if/\G[a-z]+\b[,.;]?\s*/gc;print(" UPPERCASE"),redo LOOP
if/\G[A-Z]+\b[,.;]?\s*/gc;print(" Capitalized"),
redo LOOP if/\G[A-Z][a-z]+\b[,.;]?\s*/gc;
print(" MiXeD"),redo LOOP if/\G[A-Za-z]+\b[,.;]?\s*/gc;print(
" alphanumeric"),redo LOOP if/\G[A-Za-z0-9]+\b[,.;]?\s*/gc;print(" line-noise"
),redo LOOP if/\G[^A-Za-z0-9]+/gc;print". That's all!\n";}

to
$_ = <<'EOL';
$url = new URI::URL "http://www/"; die if $url eq "xXx";
EOL
LOOP: {
print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;
print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc;
print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc;
print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc;
print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc;
print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc;
print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc;
print ". That's all!\n";
}

This should give you a brief idea about what Perltidy does and how powerful it is. This helps us to enforce code style in and across teams as long as we all

  1. install perltidy
  2. share the same .perltidyrc
  3. run perltidy before check in

We don't actually need to do all of them by hand. Hudson component job is a best place that we can set it up to tidy our Perl code automatically if you are lazy to run it before check in.

Here is my .perltidyrc file and I would like you to copy and paste to your home directory

[jqyao@springstation:build VIM ~]$ more .perltidyrc
# file:.perltidyrc
-l=100 # max line width
-i=2 # Indent level is 2 cols
-ci=2 # Continuation indent is 2 cols
-st # Output to STDOUT
-se # Errors to STDERR
-vt=2 # vertical tightness
-cti=0 # No extra indentation for closing brackets
-pt=1 # Medium parenthesis tightness
-bt=1 # Medium brace tightness
-sbt=1 # Medium square bracket tightness
-bbt=1 # Medium block brace tightness
-nsfs # No space before semicolons
-nolq # Don't outdent long quoted strings
-bl # opening brace on newline
-nbbc # no blank lines before whole-line comments
# Break before all operators
-wbb="% + - * / x != == >= | & **= += *= &= <>= ||= .= %= ^= x="

With the power of Vim, I am using perl support to run perltidy by just click \ry and it automatically tidy my code for me, very neat.

Why slow sometimes is fast

I am working with a small scrum team which consists of 4 software engineers , 1 product owner and 1 scrum master. Our scrum master is a very active agile development advocator and always trying to introduce new concepts and practices to the team, and then apply them. Currently, we are practicing kanban in developing our software. Briefly speaking, a user story travels through kanban from state to state, which includes backlog, prepping , ready, work in progress ( WIP) and done. We only allow certain and relatively a low number of stories in the WIP so we can focus on getting them done ASAP.

Although kanban emphasizes providing a visualization of your current work for your whole team so they can always swarm together to quickly clean up the stories, that’s not always true. Kanban, like almost all the other agile practices, always ignores the fact that engineers are at different levels, accidentally or intentionally. However, this is always the key point to determine whether a practice will be successful for a team.

As far as I can tell, having engineers with different skill levels in a team is usually the main reason why Kanban actually slows down the whole development, or at least in my team, so we are always trying to get everybody involved in the WIP story. First of all, you, as a quarterback of the WIP story, have to figure out how to split it into several big enough workable pieces and to spend more time on explaining to your teammates the design (and the implementation if they are junior). This communication and coordination takes a lot of extra time. Furthermore because the story is small enough most of the time, the workable pieces depend on each others quite closely. If any of the engineers couldn’t follow the schedule, they unavoidably would slow down others’ progress, which consequently leads the story into a negative feedback loop. This could happen to both senior and junior engineers because of many possible reasons that cause the work not to be finished on time. Compared to the situation that a senior person works alone, this approach in most of times is much slower.

Though this seems bad at first glance, it actually does have many pros that may or may not be able to be evaluated.

1. No engineer could block the story.

2. Force people to think the story through in attempt to split it into measurable pieces

3. Require people to design interfaces beforehand so they won’t step on each others’ toes

4. Involve more people to brainstorm and communicate

So, every coin has two sides and you have to pick up the one that benefits you the most. I think the way we choose is actually better than the “speedy” way in which a senior hero takes over everything. Here are some ideas that I think might make this work better.

1. Have a kick-off meeting with all members on the story and daily ( even hourly ) sync up meetings. Communication is very important here because having more people to work together most likely would lead to miscommunication and confusion if you don’t sync up frequently. In our team, we try to use all possible ways to make the interactions easier. For example, we use skype video calls , instant messages and adobe connect screen sharing for one on one communication. In the meanwhile, we also chat in IRC channel and conference call for group discussion. We have found that these various communication opportunities make the better understanding among the team members.

2. Pair programming. Though I intentionally uses Pair programming , it doesn’t necessarily mean that people need to physically sit together and work on the same screen. The pair could coordinate by using adobe connect , skype screen sharing or even Linux screen session sharing. Paring experienced engineers together could boost the productivity since there are more eyes for sanity check, brainstorm ideas and peer pressure. Pairing Jr. and Sr. engineers is actually one of the most effective ways to improve Jr. engineer’s skill because s/he is able to ask questions about what s/he is lack of and get instant response and training. Pair programming doesn’t mean that you need to work and share your screen with the other 8 hours a day. I always find it’s very useful to spend my time writing unit tests for others’ code alone. Of course, if you are the big fan of TDD, this won’t be necessary. I will show you later how you can make the best use of your time.

3. Fast turn around. Briefly speaking, this means checking in our code fast, breaking the pipeline fast and fixing the bugs fast. Continuous integration is required in our daily development. We setup Jenkins CI server to continuously check out code from source repository and run unit/smoke/functional tests along the build pipeline. These tests and jobs not only give you a chance to verify your code that works on the environment other than your dev box, but also provide a way for your peers to have a brief idea about the quality of your code and how confident they could be when trying to integrate your work.

Nonetheless, no matter how well you prepared, some Sr. engineers still have to wait while the Jr. counterparts implement their work. Here are somethings you may do to kill your time.

1. Work on your tech debt. For example , refactoring your code. In my use case, I would work on putting reusable code into Perl module so it can be shared and benefit more people.

2. Write documents. As mentioned by Dick Brandon, “Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing”. There are various way of documentation, including twiki and perldoc/javadoc.

3. Write unit tests for the Jr. engineer. Get the Jr. people to get used to writing unit tests takes time. They also need to spend time getting their work done. Therefore, writing unit tests for him helps maintain code easier in the future so it’s a win-win situation. However, you need to communicate before he starts his work and define the interfaces / methods. In this way, you can keep a contract between him and you to make writing unit tests easier.

4. Prepare tech talks for your team or your community. You can always learn something from work and would like share the knowledge with your teammates or other folks in or out of your company. Nothing is better than giving a tech talk or lightning speech to improve the whole team’s productivity.

5. Write automation tools. This usually isn’t included in your story, but you always feel so frustrated that you have to manually do some stupid and tedious work repeatedly. As Larry Wall mentioned in his masterpiece Programming Perl, “Three Virtues of a Programmer: Laziness – The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don’t have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris. Impatience – The anger you feel when the computer is being lazy. This makes you write programs that don’t just react to your needs, but actually anticipate them. Or at least pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris.”

In summary, slow sometimes is fast. With well-prepared coordination plan and better use of time, slowness will be eventually paid off in the long run and make the whole team to be a better one.

How does APC work

APC is just simple, you don’t need to worry about too much and just enable it in apc.ini/php.ini, until you got some performance issue. The company I am working for takes performance issue very seriously because we need to serve millions of requests every single day.

Gopal V, one of the maintainer of APC, wrote this fantastic blog to talk about autofilter. However, I have to say it’s a little bit hard to understand until I actually read the APC source code on svn.php.net. The discussion of APC below is based on stable version 3.0.19.

Here are something you may or may not know.

Zend engine needs 4 steps to run a PHP script

1. Read PHP code from file into memory

2. Lexing : convert to lexicons that can form syntax

3. Parsing and compiling : parsing lexicons into opcodes and validate language syntax

4. Executing: execute opcodes

APC mainly hijacks step 3. Instead of having Zend to do step 1,2,3 again and again, APC actually stores the opcodes into shared memory , then copies opcodes into execution process so Zend can actually execute the opcodes. Classes and functions tables are also stored into shared memory because that’s what zend_compile_file generated.

Here are the detailed steps of what APC really does

1. During module init time ( MINIT), APC uses mmap to map file into shared memory.

2. During request time, APC replaces zend_compile_file() with its own my_compile_file().

What does my_compile_file do ?

1. If APC couldn’t find file in the cache, APC will call zend_compile_file to compile file

2. Get returned opcodes and store them into shared memory which read/write from different Apache child processes. ( We are talking about Apache prefork MPM here.)

3. Also store classes and function tables in shared memory.

4. Copy opcodes into Apache child process memory so Zend can execute the code.

Looks perfect and damn simple until you have read Gopal’s blog. At the beginning of his blog, Gopal mentioned that “The PHP compiler does generate an instruction to include file, but since the engine never executed it, no error is thrown for the absense of a.php.”. You can use parsekit to actually generate opcodes of your PHP code. ( parsekit can be installed by using pecl install parsekit). Below are the opcodes generated by parsekit.

[0] => ZEND_INCLUDE_OR_EVAL T(0) ‘./parent.php’ 0x4

[1] => ZEND_FETCH_CLASS NULL UNUSED ‘ParentClass’

[2] => ZEND_DECLARE_INHERITED_CLASS NULL ‘            …’ ‘child1’

Here is one example Gopal showed in his blog

index.php –include–> child1.php –include_once–> parent.php

|–include–>child2.php –include_once–> parent.php

profile.php –include–> child2.php –include_once–> parent.php

As you can see, if you are requesting index.php. Apache forks one child process for you. After that, Zend engine reads index.php from disk since it’s not in APC/memory, then parses and compiles it. At this time, APC gets opcodes and classes/functions tables and stores in shared cache. Until Zend executes index.php, it founds child1.php is required to finish execution and load/lexing/parsing/compiling child1.php again, then APC puts child1.php in shared memory. Again, Zend and APC do the same steps for parent.php.

As you may guess, Zend and APC will do the same thing for child2.php. Yes, you are right. But there is a tiny trick, when Zend compiles child2.php and it found that parent class in parent.php is already in process memory so it changed from ZEND_FETCH_CLASS to NOP which could save time to actually load parent.php again. Gopal called this version of opcodes *static* version and the version with ZEND_FETCH_CLASS is called *dynamic* version. This is exactly the same as static and shared/dynamic library.

Static is faster than dynamic, but it’s not always usable if parent is not in process yet. In this case, APC gives up storing this file’s opcodes in APC shared memory and forces Zend to compile again and again. This is how APC autofilter works. This is basically what Gopal told us in his blog, but why?

Isn’t parent.php already in APC cache for profile.php ? Why child2.php opcodes in APC cache couldn’t be usable ?

Here is why. The kicker is “smart” zend changing ZEND_FETCH_CLASS to NOP. When you try to access profile.php, apache most probably forks/uses preforked new process for you. Therefore Zend in new process has NO idea about index/child1/child2/parent.php. However, APC knew them already and had all the opcodes in its shared memory. When you request profile.php, Zend engine load/lexing/parsing/compiling profile.phph, then it notices that child2.php is in include path and APC knows child2 is already in shared memory cache. Before copying child2 cached version from shared memory to process memory, APC looks over  all classes table in opcodes and restore aprent class pointer for compile-time inheritance. Boom!!! In current process, class parent has not been loaded yet and APC then thinks this opcodes is non-usable. This is why static version couldn’t be used here. In the other hand, dynamic version can always be used , but it’s slower.

Autoload has same issue, here I just copy comment in APC source code

/*
* __autoload brings in the old issues with mixed inheritance.
* When a statically inherited class triggers autoload, it runs
* afoul of a potential require_once “parent.php” in the previous
* line, which when executed provides the parent class, but right
* now goes and hits __autoload which could fail.
*
* missing parent == re-compile.
*
* whether __autoload is enabled or not, because __autoload errors
* cause php to die.
*
* Aside: Do NOT pass *strlen(cl.parent_name)+1* because
* zend_lookup_class_ex does it internally anyway!
*/

Simply to say, if you are using autoload, you have to pay penalty that you files cannot be in APC cache and Zend needs to recompile them again and again.

I hope I make myself clear.