Well, that was truly awful.
Thankfully, the servers are now back the way they should be.
Lots has been going on with the Wireless Nomad project over the last few weeks, from fixing the servers to having some new people helping out. What follows is a brief introduction and summary.
A Simple Update, Servers Destroyed
What should have been a simple security patch to the software on the servers ended up causing an unbelievable numbers of problems.
Since the co-op's two servers run the wireless authentication, the web site, the voicemail, and the e-mail, when the servers went down, everything went off-line except the DSL Internet connections.
Other things compounded the problem -- though Steve fixed the problem initially, he was out of the country when the problem reappeared a few days later. Although Ron was helping and managed to fix the problem the second and third time it appeared, he was still new at running the Wireless Nomad system (though he has years of experience with this kind of stuff generally). If it had happened a month earlier or a month later, this wouldn't have happened -- but it's not like we had a choice of timing!
Steps Are Being Taken to Make Sure It Doesn't Happen Again
First, Ron knows lots more about the Wireless Nomad system and is better able help Steve keep things running.
Second, security updates will be applied more frequently. Part of the reason why this update caused such problems was that the system had not been patched for some time, causing an incompatibility between an older piece of software on the servers and the new update. Of course, each software update or security patch potentially causes some problems, so they are often best avoided unless there is an obvious flaw or a need to upgrade. Sort of a catch-22, but more regular updates might help avoid catastrophic problems like this one proved to be.
Third, we will be making a test machine (known as a development box) that will have a copy of everything the servers do, where we can test new software updates. This will take some time to get running, but could be quite useful in the future.
Fourth, we are looking for other people interested in helping keep the servers running. Another member, Bryn, has indicated he will be able to assist from time to time. If anyone else is familiar with running servers and is interested in, (especially with Gentoo Linux and MySQL), please get in touch to see how you can help.
Fifth, we are looking to create some way to have a "system out of order" web page that appears on the wireless login, if the proper login page cannot load for whatever reason. Obviously, this cannot be on the existing servers, which means it needs some sort of outside resources, etc. It would be good to let people know easily what is wrong, so they don't think it is something wrong with their machine or whatnot.
Sixth, there has been talked about creating some way for the wireless to fail into an open mode, instead of failing and staying closed. This would likely take a couple of months to figure out, as none of these wireless things are easy. That way, at least, if the wireless authentication went down for a day or so, everyone would still be able to use the wireless.
Seventh, we're looking for ways to keep the lines of communication open at all times, even when the servers are down. Outside e-mail or a different phone system are all being considered, but will take extra time and money to make happen, so what happens there will depend on many things. Besides, it is much better to keep the servers running properly than to invest too much into making backup systems. Many of you have also found the wireless nomad blog (wirelessnomad.blogspot.com), where we can post updates even if the Wireless Nomad servers are not working. There is also now a GMail address for the co-op so we can keep in touch with this backup e-mail (contact.wirelessnomad@gmail.com). Please, don't use this e-mail address unless there is a general system problem, as it will not be checked unless there is a general system problem.
Eighth, we're looking at ways to simplify things. The servers are doing a lot right now -- wireless authentication, web site hosting, e-mail, web e-mail, running the member database, running all the membership and service management tools, and running the voicemail. If we can simplify things, there is less of a chance for things to go wrong. I think webmail will be one of the first to go, but exactly which services stay and which services go all the to a member vote.
Ninth and finally, the DSL kept working the entire time, so we want to make sure people know how to connect using the wire. We all assumed that it would be no problem for people to know how to do that, and assumed everybody knew without needing to be told. While instructions are included for how to connect to the wired side of things with equipment everybody gets when they sign up, some people never got that sheet because they signed up before it was made, and I'm sure not everyone reads it or manages to hang onto it. Since the DSL Internet was working the entire time, everybody had Internet the entire time, even though the wireless was not working (I know a few people connect only through the wireless, but the wire was working if they were able or wanted to use it). Being without wireless is inconvenient; being without Internet is painful. Sorry for not making it clear in advance that the two ways of connecting use different systems, I'm sure many people would have plugged in and being very leaped to least able to connect on the wire side.
Deciding What to Do Next Together
Since we are a co-op, we make major decisions together. Up till now, it's been mostly a small handful of us doing everything we can make things work, but I think there are enough people interested enough now to start doing things more democratically.
Basically, I think most people really like what the project is about, and it really like to see things succeed. At the same time, however, they really want their Internet to keep working. At the end of January or early February, we will have a general meeting to figure out how to make that happen. Again, we will decide this together, but if paying another ten or fifteen cents a day each made it possible to keep the system running much better, I think most of us would have paid it, especially last month. We will look into some options, and we will vote on what to do.
Other Stuff -- Things That Are Working and Are Fun!
Some other good stuff has been going on at the same time. Bell has been upgrading some of our circuits to 5 Mb connections, so that means or Internet is getting faster in some instances and not costing any more.
The first project with the University of Toronto, the Mesh Configuration Project, is proceeding well. The specifications have been completed, and a basic working version of the program that will let people configure their own mesh networking equipment (if they want to) is pretty well done. Second and third projects, to create a LAN router config program (give people more control over their Wireless Nomad routers) and to make a member and user management portal (monitor your bandwidth, view your uptime, interact with other members, etc.) are being discussed, and hopefully will be worked on in the new year.
Also, Luis, a programmer, has volunteered to work on the user and membership management system, making lots of improvements and changes to make technical support easier and to simplify account and connection management. Leif, another member with computer experience, has also volunteered to improve the wireless login page, which has some inefficient code and could use some serious work.
Finally, technical support is probably the hardest thing for us as a group to do, so we are very lucky to have found someone to help with that as well. Starting in January, there will be at least two days a week where for a couple of hours in the evening members will be able to call in and reach someone live, to either asked questions, update information, or whatever else they need to do. It will also help us improve callback and support time, as there will be extra hands helping. In exchange, we are providing Karina with an Internet connection-- but that will cost money, and we will have to discuss how to pay for such things at the annual meeting in a month or so.
Of course, everyone is still able to call the 647-722-2094 telephone number and leave a message, where one of us can return their call as soon as possible (and we would like to change the message there as well -- yes, we know it is cheesy).
So: current problems fixed, taking steps to stop new problems, hundreds of new free account users in the last month, deciding things together, new volunteers helping to make things work more efficiently and better, some great new work with the University of Toronto, and some positive steps towards improving member and technical support.
Thanks to everyone for their patience and suggestions and their hard work,
Sincerely,
-Damien Fox
-------------------------------------------------
Visit
www.onlinerights.ca/