Python Pitfalls #2: Operator * with Mutable Objects

To initialize a list of items, we can use operator * to duplicate one value to multiples. For example, the following code generate a list of 5 zeros. Then we update the second item from 0 to 1. When we print out the list, everything works just as it should be.

Operator *

Figure 2-1. Initialize a list to 5 zeros with operator *

However, if we apply the same trick to mutable variables, something weird will happen. For example, the following code generates a list of 5 sublist of two zeros. When we print out the array, it looks fine. Then when we update the second item in the second sublist from zero to 1, you’ll be surprised to see all second items for all sublist in this list changed from 0 to 1.

operator *

Figure 2-2. Initialize a list with 5 sublist of two zeros using operator *

Why is that? The reason is due to the nature of operator *, it will copy the same object over and over instead of creating new ones. When used for immutable data types such as numbers and strings, this behavior has no problem since a new object will be created everytime we modify its value. See the code below for the details.

operator *

Figure 2-3. For mutable data types a new object will be created when its value being modified.

As we can see from above code, the object id for the second item is changed after we modified its value, while the other items in the list won’t be affected.

With mutable data type like list, the result is different because the object is still the same after its value being changed. In the sample below, the second item in the second sublist is changed from 0 to 1, but the object id for the second sublist kept the same, which is the same for all the other sublists in the list. As a result, all the items in the list has been updated.

operator *

Figure 2-4. For mutable data type, an object is still the same after its value being modified.

To get around this issue, we should use list comprehension instead of operator * to initialize mutable objects.

operator *

Figure 2-5. Use list comprehension to initialize mutable objects.

When we check the object ids for the above code, we can see that all items are different. So modifying one object won’t impact the others.

operator *

Figure 2-6. Object Ids are different using list comprehension to initialize mutable data types.


Python编程陷阱 #2:可变数据类型和 * 操作符


初始化列表我们可以使用 * 操作符将一个数值复制成多份。譬如图例 2-1 中的代码将一个列表初始化为五个零。当我们修改列表中的数值时,没有任何异常情况。

但是如果我们用同样的方法初始化一串子列表,就会出现意想不到的情况。图例 2-2 中的代码将一个列表初始化为五个子列表,每个子列表里面都是两个零。当我们修改第二个子列表里面的第二个零的时候,所有子列表的第二个零都被修改了。

为什么会出现这样的情况?这是因为 * 操作符在复制数值时,并没有创建新的对象,而仅仅是把原来的对象拷贝多份。我们从每个对象的ID就可以看出来,* 操作符复制之后,所有的对象ID都是相同的(图例 2-3)。对于不可变数据类型如数字、字符串,这种操作没有问题,因为当我们修改一个不可变数据对象后,会产生一个新的对象,原来的对象不会受到影响。

对于可变数据类型如列表,同样的操作就会产生副作用。如图例 2-4 所示,当我们修改子列表的内容时,该对象的ID保持不变,这样其他子列表都同样指向一个对象,结果我们就看到列表里面所有的子列表都被修改了。

解决这个问题的方法是避免使用 * 操作符来复制多份可变数据对象,而是采用列表推导式来复制,如图例 2-5 里的代码那样,就不会有问题。我们可以检查用列表推导式复制产生的每个子列表(图例 2-6),显示每个子列表的ID都不相同。这样当我们修改其中一个子列表时,其他的子列表就不会受影响。